Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors

2019-05-30 Thread David Marchand
On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger <
step...@networkplumber.org> wrote:

> On Thu, 30 May 2019 00:46:30 +0200
> Thomas Monjalon  wrote:
>
> > 23/05/2019 15:58, David Marchand:
> > > From: Stephen Hemminger 
> > >
> > > The fields of the internal EAL core configuration are currently
> > > laid bare as part of the API. This is not good practice and limits
> > > fixing issues with layout and sizes.
> > >
> > > Make new accessor functions for the fields used by current drivers
> > > and examples.
> > [...]
> > > +DPDK_19.08 {
> > > +   global:
> > > +
> > > +   rte_lcore_cpuset;
> > > +   rte_lcore_index;
> > > +   rte_lcore_to_cpu_id;
> > > +   rte_lcore_to_socket_id;
> > > +
> > > +} DPDK_19.05;
> > > +
> > >  EXPERIMENTAL {
> > > global:
> >
> > Just to make sure, are we OK to introduce these functions
> > as non-experimental?
>
> They were in previous releases as inlines this patch converts them
> to real functions.
>
>
Well, yes and no.

rte_lcore_index and rte_lcore_to_socket_id already existed, so making them
part of the ABI is fine for me.

rte_lcore_to_cpu_id is new but seems quite safe in how it can be used,
adding it to the ABI is ok for me.

rte_lcore_cpuset is new too, and still a bit obscure to me. I am not really
convinced we need it until I understand why dpaa2 and fslmc bus need to
know about this.
I might need more time to look at it, so flag this as experimental sounds
fair to me.


-- 
David Marchand


Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors

2019-05-30 Thread Thomas Monjalon
30/05/2019 09:31, David Marchand:
> On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger <
> step...@networkplumber.org> wrote:
> 
> > On Thu, 30 May 2019 00:46:30 +0200
> > Thomas Monjalon  wrote:
> >
> > > 23/05/2019 15:58, David Marchand:
> > > > From: Stephen Hemminger 
> > > >
> > > > The fields of the internal EAL core configuration are currently
> > > > laid bare as part of the API. This is not good practice and limits
> > > > fixing issues with layout and sizes.
> > > >
> > > > Make new accessor functions for the fields used by current drivers
> > > > and examples.
> > > [...]
> > > > +DPDK_19.08 {
> > > > +   global:
> > > > +
> > > > +   rte_lcore_cpuset;
> > > > +   rte_lcore_index;
> > > > +   rte_lcore_to_cpu_id;
> > > > +   rte_lcore_to_socket_id;
> > > > +
> > > > +} DPDK_19.05;
> > > > +
> > > >  EXPERIMENTAL {
> > > > global:
> > >
> > > Just to make sure, are we OK to introduce these functions
> > > as non-experimental?
> >
> > They were in previous releases as inlines this patch converts them
> > to real functions.
> >
> >
> Well, yes and no.
> 
> rte_lcore_index and rte_lcore_to_socket_id already existed, so making them
> part of the ABI is fine for me.
> 
> rte_lcore_to_cpu_id is new but seems quite safe in how it can be used,
> adding it to the ABI is ok for me.

It is used by DPAA and some test.
I guess adding as experimental is fine too?
I'm fine with both options, I'm just trying to apply the policy
we agreed on. Does this case deserve an exception?

> rte_lcore_cpuset is new too, and still a bit obscure to me. I am not really
> convinced we need it until I understand why dpaa2 and fslmc bus need to
> know about this.
> I might need more time to look at it, so flag this as experimental sounds
> fair to me.





Re: [dpdk-dev] [PATCH 25/25] eal: hide shared memory config

2019-05-30 Thread Burakov, Anatoly

On 29-May-19 5:40 PM, Stephen Hemminger wrote:

On Wed, 29 May 2019 17:31:11 +0100
Anatoly Burakov  wrote:


+static inline void
+rte_eal_mcfg_wait_complete(struct rte_mem_config *mcfg)
+{
+   /* wait until shared mem_config finish initialising */
+   while (mcfg->magic != RTE_MAGIC)
+   rte_pause();
+}



Not fast path, why is this inline?


I kept existing function. Have no preference one way or the other, can 
change in V2.





+#endif // EAL_MEMCFG_H


Avoid C++ style comments.



Will fix.

--
Thanks,
Anatoly


[dpdk-dev] [PATCH v1 0/7] add multiple cores feature to test-compress-perf

2019-05-30 Thread Tomasz Jozwiak
This patchset adds multiple cores feature to compression perf tool.
All structures have been aligned and are consistent
with crypto perf tool. All test cases have constructor, runner
and destructor and can use more cores and compression devices at
the same time.

Tomasz Jozwiak (7):
  app/test-compress-perf: add weak functions for multi-cores test
  app/test-compress-perf: add ptest command line option
  app/test-compress-perf: add verification test case
  app/test-compress-perf: add benchmark test case
  doc: update dpdk-test-compress-perf description
  app/test-compress-perf: add force process termination
  doc: update release notes for 19.08

 app/test-compress-perf/Makefile   |   1 +
 app/test-compress-perf/comp_perf.h|  61 +++
 app/test-compress-perf/comp_perf_options.h|  46 +-
 app/test-compress-perf/comp_perf_options_parse.c  |  58 +-
 app/test-compress-perf/comp_perf_test_benchmark.c | 152 --
 app/test-compress-perf/comp_perf_test_benchmark.h |  25 +-
 app/test-compress-perf/comp_perf_test_common.c| 285 ++
 app/test-compress-perf/comp_perf_test_common.h|  41 ++
 app/test-compress-perf/comp_perf_test_verify.c| 136 +++--
 app/test-compress-perf/comp_perf_test_verify.h|  24 +-
 app/test-compress-perf/main.c | 630 ++
 app/test-compress-perf/meson.build|   3 +-
 doc/guides/rel_notes/release_19_08.rst|   3 +
 doc/guides/tools/comp_perf.rst|  34 +-
 14 files changed, 1033 insertions(+), 466 deletions(-)
 create mode 100644 app/test-compress-perf/comp_perf.h
 create mode 100644 app/test-compress-perf/comp_perf_test_common.c
 create mode 100644 app/test-compress-perf/comp_perf_test_common.h

-- 
2.7.4



[dpdk-dev] [PATCH v1 1/7] app/test-compress-perf: add weak functions for multi-cores test

2019-05-30 Thread Tomasz Jozwiak
This patch adds a template functions for multi-cores performance
version of compress-perf-tool.

Signed-off-by: Tomasz Jozwiak 
---
 app/test-compress-perf/Makefile  |   3 +-
 app/test-compress-perf/comp_perf.h   |  61 +++
 app/test-compress-perf/comp_perf_options.h   |  45 +-
 app/test-compress-perf/comp_perf_options_parse.c |  24 +-
 app/test-compress-perf/comp_perf_test_common.c   | 285 +++
 app/test-compress-perf/comp_perf_test_common.h   |  41 ++
 app/test-compress-perf/main.c| 624 ++-
 app/test-compress-perf/meson.build   |   3 +-
 8 files changed, 685 insertions(+), 401 deletions(-)
 create mode 100644 app/test-compress-perf/comp_perf.h
 create mode 100644 app/test-compress-perf/comp_perf_test_common.c
 create mode 100644 app/test-compress-perf/comp_perf_test_common.h

diff --git a/app/test-compress-perf/Makefile b/app/test-compress-perf/Makefile
index d20e17e..de74129 100644
--- a/app/test-compress-perf/Makefile
+++ b/app/test-compress-perf/Makefile
@@ -12,7 +12,6 @@ CFLAGS += -O3
 # all source are stored in SRCS-y
 SRCS-y := main.c
 SRCS-y += comp_perf_options_parse.c
-SRCS-y += comp_perf_test_verify.c
-SRCS-y += comp_perf_test_benchmark.c
+SRCS-y += comp_perf_test_common.c
 
 include $(RTE_SDK)/mk/rte.app.mk
diff --git a/app/test-compress-perf/comp_perf.h 
b/app/test-compress-perf/comp_perf.h
new file mode 100644
index 000..144ad8a
--- /dev/null
+++ b/app/test-compress-perf/comp_perf.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#ifndef _COMP_PERF_
+#define _COMP_PERF_
+
+#include 
+
+struct comp_test_data;
+
+typedef void  *(*cperf_constructor_t)(
+   uint8_t dev_id,
+   uint16_t qp_id,
+   struct comp_test_data *options);
+
+typedef int (*cperf_runner_t)(void *test_ctx);
+typedef void (*cperf_destructor_t)(void *test_ctx);
+
+struct cperf_test {
+   cperf_constructor_t constructor;
+   cperf_runner_t runner;
+   cperf_destructor_t destructor;
+};
+
+/* Needed for weak functions*/
+
+void *
+cperf_benchmark_test_constructor(uint8_t dev_id __rte_unused,
+uint16_t qp_id __rte_unused,
+struct comp_test_data *options __rte_unused);
+
+void
+cperf_benchmark_test_destructor(void *arg __rte_unused);
+
+int
+cperf_benchmark_test_runner(void *test_ctx __rte_unused);
+
+void *
+cperf_verify_test_constructor(uint8_t dev_id __rte_unused,
+uint16_t qp_id __rte_unused,
+struct comp_test_data *options __rte_unused);
+
+void
+cperf_verify_test_destructor(void *arg __rte_unused);
+
+int
+cperf_verify_test_runner(void *test_ctx __rte_unused);
+
+void *
+cperf_pmd_cyclecount_test_constructor(uint8_t dev_id __rte_unused,
+uint16_t qp_id __rte_unused,
+struct comp_test_data *options __rte_unused);
+
+void
+cperf_pmd_cyclecount_test_destructor(void *arg __rte_unused);
+
+int
+cperf_pmd_cyclecount_test_runner(void *test_ctx __rte_unused);
+
+#endif /* _COMP_PERF_ */
diff --git a/app/test-compress-perf/comp_perf_options.h 
b/app/test-compress-perf/comp_perf_options.h
index f87751d..79e63d5 100644
--- a/app/test-compress-perf/comp_perf_options.h
+++ b/app/test-compress-perf/comp_perf_options.h
@@ -13,6 +13,24 @@
 #define MAX_MBUF_DATA_SIZE (UINT16_MAX - RTE_PKTMBUF_HEADROOM)
 #define MAX_SEG_SIZE ((int)(MAX_MBUF_DATA_SIZE / EXPANSE_RATIO))
 
+extern const char *cperf_test_type_strs[];
+
+/* Cleanup state machine */
+enum cleanup_st {
+   ST_CLEAR = 0,
+   ST_TEST_DATA,
+   ST_COMPDEV,
+   ST_INPUT_DATA,
+   ST_MEMORY_ALLOC,
+   ST_DURING_TEST
+};
+
+enum cperf_perf_test_type {
+   CPERF_TEST_TYPE_BENCHMARK,
+   CPERF_TEST_TYPE_VERIFY,
+   CPERF_TEST_TYPE_PMDCC
+};
+
 enum comp_operation {
COMPRESS_ONLY,
DECOMPRESS_ONLY,
@@ -30,37 +48,26 @@ struct range_list {
 struct comp_test_data {
char driver_name[64];
char input_file[64];
-   struct rte_mbuf **comp_bufs;
-   struct rte_mbuf **decomp_bufs;
-   uint32_t total_bufs;
+   enum cperf_perf_test_type test;
+
uint8_t *input_data;
size_t input_data_sz;
-   uint8_t *compressed_data;
-   uint8_t *decompressed_data;
-   struct rte_mempool *comp_buf_pool;
-   struct rte_mempool *decomp_buf_pool;
-   struct rte_mempool *op_pool;
-   int8_t cdev_id;
+   uint16_t nb_qps;
uint16_t seg_sz;
uint16_t out_seg_sz;
uint16_t burst_sz;
uint32_t pool_sz;
uint32_t num_iter;
uint16_t max_sgl_segs;
+
enum rte_comp_huffman huffman_enc;
enum comp_operation test_op;
int window_sz;
-   struct range_list level;
-   /* Store TSC duration for all levels (including level 0) */
-   uint64_t comp_tsc_durat

[dpdk-dev] [PATCH v1 3/7] app/test-compress-perf: add verification test case

2019-05-30 Thread Tomasz Jozwiak
This patch adds a verification part to
compression-perf-tool as a separate test case, which can be
executed multi-threaded.

Signed-off-by: Tomasz Jozwiak 
---
 app/test-compress-perf/Makefile|   1 +
 app/test-compress-perf/comp_perf_test_verify.c | 122 ++---
 app/test-compress-perf/comp_perf_test_verify.h |  24 -
 app/test-compress-perf/main.c  |   1 +
 app/test-compress-perf/meson.build |   1 +
 5 files changed, 112 insertions(+), 37 deletions(-)

diff --git a/app/test-compress-perf/Makefile b/app/test-compress-perf/Makefile
index de74129..f54d9a4 100644
--- a/app/test-compress-perf/Makefile
+++ b/app/test-compress-perf/Makefile
@@ -12,6 +12,7 @@ CFLAGS += -O3
 # all source are stored in SRCS-y
 SRCS-y := main.c
 SRCS-y += comp_perf_options_parse.c
+SRCS-y += comp_perf_test_verify.c
 SRCS-y += comp_perf_test_common.c
 
 include $(RTE_SDK)/mk/rte.app.mk
diff --git a/app/test-compress-perf/comp_perf_test_verify.c 
b/app/test-compress-perf/comp_perf_test_verify.c
index 28a0fe8..c2aab70 100644
--- a/app/test-compress-perf/comp_perf_test_verify.c
+++ b/app/test-compress-perf/comp_perf_test_verify.c
@@ -8,14 +8,48 @@
 #include 
 
 #include "comp_perf_test_verify.h"
+#include "comp_perf_test_common.h"
+
+void
+cperf_verify_test_destructor(void *arg)
+{
+   if (arg) {
+   comp_perf_free_memory(&((struct cperf_verify_ctx *)arg)->mem);
+   rte_free(arg);
+   }
+}
+
+void *
+cperf_verify_test_constructor(uint8_t dev_id, uint16_t qp_id,
+   struct comp_test_data *options)
+{
+   struct cperf_verify_ctx *ctx = NULL;
+
+   ctx = rte_malloc(NULL, sizeof(struct cperf_verify_ctx), 0);
+
+   if (ctx != NULL) {
+   ctx->mem.dev_id = dev_id;
+   ctx->mem.qp_id = qp_id;
+   ctx->options = options;
+
+   if (!comp_perf_allocate_memory(ctx->options, &ctx->mem) &&
+   !prepare_bufs(ctx->options, &ctx->mem))
+   return ctx;
+   }
+
+   cperf_verify_test_destructor(ctx);
+   return NULL;
+}
 
 static int
-main_loop(struct comp_test_data *test_data, uint8_t level,
-   enum rte_comp_xform_type type,
-   uint8_t *output_data_ptr,
-   size_t *output_data_sz)
+main_loop(struct cperf_verify_ctx *ctx, enum rte_comp_xform_type type)
 {
-   uint8_t dev_id = test_data->cdev_id;
+   struct comp_test_data *test_data = ctx->options;
+   uint8_t *output_data_ptr;
+   size_t *output_data_sz;
+   struct cperf_mem_resources *mem = &ctx->mem;
+
+   uint8_t dev_id = mem->dev_id;
uint32_t i, iter, num_iter;
struct rte_comp_op **ops, **deq_ops;
void *priv_xform = NULL;
@@ -33,7 +67,7 @@ main_loop(struct comp_test_data *test_data, uint8_t level,
}
 
ops = rte_zmalloc_socket(NULL,
-   2 * test_data->total_bufs * sizeof(struct rte_comp_op *),
+   2 * mem->total_bufs * sizeof(struct rte_comp_op *),
0, rte_socket_id());
 
if (ops == NULL) {
@@ -42,7 +76,7 @@ main_loop(struct comp_test_data *test_data, uint8_t level,
return -1;
}
 
-   deq_ops = &ops[test_data->total_bufs];
+   deq_ops = &ops[mem->total_bufs];
 
if (type == RTE_COMP_COMPRESS) {
xform = (struct rte_comp_xform) {
@@ -50,14 +84,16 @@ main_loop(struct comp_test_data *test_data, uint8_t level,
.compress = {
.algo = RTE_COMP_ALGO_DEFLATE,
.deflate.huffman = test_data->huffman_enc,
-   .level = level,
+   .level = test_data->level,
.window_size = test_data->window_sz,
.chksum = RTE_COMP_CHECKSUM_NONE,
.hash_algo = RTE_COMP_HASH_ALGO_NONE
}
};
-   input_bufs = test_data->decomp_bufs;
-   output_bufs = test_data->comp_bufs;
+   output_data_ptr = ctx->mem.compressed_data;
+   output_data_sz = &ctx->comp_data_sz;
+   input_bufs = mem->decomp_bufs;
+   output_bufs = mem->comp_bufs;
out_seg_sz = test_data->out_seg_sz;
} else {
xform = (struct rte_comp_xform) {
@@ -69,8 +105,10 @@ main_loop(struct comp_test_data *test_data, uint8_t level,
.hash_algo = RTE_COMP_HASH_ALGO_NONE
}
};
-   input_bufs = test_data->comp_bufs;
-   output_bufs = test_data->decomp_bufs;
+   output_data_ptr = ctx->mem.decompressed_data;
+   output_data_sz = &ctx->decomp_data_sz;
+   input_bufs = mem->comp_bufs;
+   output_bufs = mem->decomp_bufs;
out_

[dpdk-dev] [PATCH v1 2/7] app/test-compress-perf: add ptest command line option

2019-05-30 Thread Tomasz Jozwiak
This patch adds --ptest option to make possible a choose
of test case from command line.

Signed-off-by: Tomasz Jozwiak 
---
 app/test-compress-perf/comp_perf_options_parse.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/app/test-compress-perf/comp_perf_options_parse.c 
b/app/test-compress-perf/comp_perf_options_parse.c
index bc4b98a..07672b2 100644
--- a/app/test-compress-perf/comp_perf_options_parse.c
+++ b/app/test-compress-perf/comp_perf_options_parse.c
@@ -15,6 +15,7 @@
 
 #include "comp_perf_options.h"
 
+#define CPERF_PTEST_TYPE   ("ptest")
 #define CPERF_DRIVER_NAME  ("driver-name")
 #define CPERF_TEST_FILE("input-file")
 #define CPERF_SEG_SIZE ("seg-sz")
@@ -37,6 +38,7 @@ static void
 usage(char *progname)
 {
printf("%s [EAL options] --\n"
+   " --ptest benchmark / verify :"
" --driver-name NAME: compress driver to use\n"
" --input-file NAME: file to compress and decompress\n"
" --extended-input-sz N: extend file data up to this size 
(default: no extension)\n"
@@ -76,6 +78,37 @@ get_str_key_id_mapping(struct name_id_map *map, unsigned int 
map_len,
 }
 
 static int
+parse_cperf_test_type(struct comp_test_data *test_data, const char *arg)
+{
+   struct name_id_map cperftest_namemap[] = {
+   {
+   cperf_test_type_strs[CPERF_TEST_TYPE_BENCHMARK],
+   CPERF_TEST_TYPE_BENCHMARK
+   },
+   {
+   cperf_test_type_strs[CPERF_TEST_TYPE_VERIFY],
+   CPERF_TEST_TYPE_VERIFY
+   },
+   {
+   cperf_test_type_strs[CPERF_TEST_TYPE_PMDCC],
+   CPERF_TEST_TYPE_PMDCC
+   }
+   };
+
+   int id = get_str_key_id_mapping(
+   (struct name_id_map *)cperftest_namemap,
+   RTE_DIM(cperftest_namemap), arg);
+   if (id < 0) {
+   RTE_LOG(ERR, USER1, "failed to parse test type");
+   return -1;
+   }
+
+   test_data->test = (enum cperf_perf_test_type)id;
+
+   return 0;
+}
+
+static int
 parse_uint32_t(uint32_t *value, const char *arg)
 {
char *end = NULL;
@@ -499,6 +532,8 @@ struct long_opt_parser {
 };
 
 static struct option lgopts[] = {
+
+   { CPERF_PTEST_TYPE, required_argument, 0, 0 },
{ CPERF_DRIVER_NAME, required_argument, 0, 0 },
{ CPERF_TEST_FILE, required_argument, 0, 0 },
{ CPERF_SEG_SIZE, required_argument, 0, 0 },
@@ -517,6 +552,7 @@ static int
 comp_perf_opts_parse_long(int opt_idx, struct comp_test_data *test_data)
 {
struct long_opt_parser parsermap[] = {
+   { CPERF_PTEST_TYPE, parse_cperf_test_type },
{ CPERF_DRIVER_NAME,parse_driver_name },
{ CPERF_TEST_FILE,  parse_test_file },
{ CPERF_SEG_SIZE,   parse_seg_sz },
-- 
2.7.4



[dpdk-dev] [PATCH v1 4/7] app/test-compress-perf: add benchmark test case

2019-05-30 Thread Tomasz Jozwiak
This patch adds a benchmark part to
compression-perf-tool as a separate test case, which can be
executed multi-threaded.

Signed-off-by: Tomasz Jozwiak 
---
 app/test-compress-perf/Makefile   |   1 +
 app/test-compress-perf/comp_perf_test_benchmark.c | 139 --
 app/test-compress-perf/comp_perf_test_benchmark.h |  25 +++-
 app/test-compress-perf/main.c |   1 +
 app/test-compress-perf/meson.build|   1 +
 5 files changed, 129 insertions(+), 38 deletions(-)

diff --git a/app/test-compress-perf/Makefile b/app/test-compress-perf/Makefile
index f54d9a4..d1a6820 100644
--- a/app/test-compress-perf/Makefile
+++ b/app/test-compress-perf/Makefile
@@ -13,6 +13,7 @@ CFLAGS += -O3
 SRCS-y := main.c
 SRCS-y += comp_perf_options_parse.c
 SRCS-y += comp_perf_test_verify.c
+SRCS-y += comp_perf_test_benchmark.c
 SRCS-y += comp_perf_test_common.c
 
 include $(RTE_SDK)/mk/rte.app.mk
diff --git a/app/test-compress-perf/comp_perf_test_benchmark.c 
b/app/test-compress-perf/comp_perf_test_benchmark.c
index 5752906..9b0b146 100644
--- a/app/test-compress-perf/comp_perf_test_benchmark.c
+++ b/app/test-compress-perf/comp_perf_test_benchmark.c
@@ -10,11 +10,45 @@
 
 #include "comp_perf_test_benchmark.h"
 
+void
+cperf_benchmark_test_destructor(void *arg)
+{
+   if (arg) {
+   comp_perf_free_memory(
+   &((struct cperf_benchmark_ctx *)arg)->ver.mem);
+   rte_free(arg);
+   }
+}
+
+void *
+cperf_benchmark_test_constructor(uint8_t dev_id, uint16_t qp_id,
+   struct comp_test_data *options)
+{
+   struct cperf_benchmark_ctx *ctx = NULL;
+
+   ctx = rte_malloc(NULL, sizeof(struct cperf_benchmark_ctx), 0);
+
+   if (ctx != NULL) {
+   ctx->ver.mem.dev_id = dev_id;
+   ctx->ver.mem.qp_id = qp_id;
+   ctx->ver.options = options;
+   ctx->ver.silent = 1; /* ver. part will be silent */
+
+   if (!comp_perf_allocate_memory(ctx->ver.options, &ctx->ver.mem)
+ && !prepare_bufs(ctx->ver.options, &ctx->ver.mem))
+   return ctx;
+   }
+
+   cperf_benchmark_test_destructor(ctx);
+   return NULL;
+}
+
 static int
-main_loop(struct comp_test_data *test_data, uint8_t level,
-   enum rte_comp_xform_type type)
+main_loop(struct cperf_benchmark_ctx *ctx, enum rte_comp_xform_type type)
 {
-   uint8_t dev_id = test_data->cdev_id;
+   struct comp_test_data *test_data = ctx->ver.options;
+   struct cperf_mem_resources *mem = &ctx->ver.mem;
+   uint8_t dev_id = mem->dev_id;
uint32_t i, iter, num_iter;
struct rte_comp_op **ops, **deq_ops;
void *priv_xform = NULL;
@@ -31,7 +65,7 @@ main_loop(struct comp_test_data *test_data, uint8_t level,
}
 
ops = rte_zmalloc_socket(NULL,
-   2 * test_data->total_bufs * sizeof(struct rte_comp_op *),
+   2 * mem->total_bufs * sizeof(struct rte_comp_op *),
0, rte_socket_id());
 
if (ops == NULL) {
@@ -40,7 +74,7 @@ main_loop(struct comp_test_data *test_data, uint8_t level,
return -1;
}
 
-   deq_ops = &ops[test_data->total_bufs];
+   deq_ops = &ops[mem->total_bufs];
 
if (type == RTE_COMP_COMPRESS) {
xform = (struct rte_comp_xform) {
@@ -48,14 +82,14 @@ main_loop(struct comp_test_data *test_data, uint8_t level,
.compress = {
.algo = RTE_COMP_ALGO_DEFLATE,
.deflate.huffman = test_data->huffman_enc,
-   .level = level,
+   .level = test_data->level,
.window_size = test_data->window_sz,
.chksum = RTE_COMP_CHECKSUM_NONE,
.hash_algo = RTE_COMP_HASH_ALGO_NONE
}
};
-   input_bufs = test_data->decomp_bufs;
-   output_bufs = test_data->comp_bufs;
+   input_bufs = mem->decomp_bufs;
+   output_bufs = mem->comp_bufs;
out_seg_sz = test_data->out_seg_sz;
} else {
xform = (struct rte_comp_xform) {
@@ -67,8 +101,8 @@ main_loop(struct comp_test_data *test_data, uint8_t level,
.hash_algo = RTE_COMP_HASH_ALGO_NONE
}
};
-   input_bufs = test_data->comp_bufs;
-   output_bufs = test_data->decomp_bufs;
+   input_bufs = mem->comp_bufs;
+   output_bufs = mem->decomp_bufs;
out_seg_sz = test_data->seg_sz;
}
 
@@ -82,13 +116,13 @@ main_loop(struct comp_test_data *test_data, uint8_t level,
 
uint64_t tsc_start, tsc_end, tsc_duration;
 
-   tsc_start = tsc_end = tsc_duration = 0;
-   tsc_start = r

[dpdk-dev] [PATCH v1 5/7] doc: update dpdk-test-compress-perf description

2019-05-30 Thread Tomasz Jozwiak
This patch updates a dpdk-test-compress-perf documentation.

Signed-off-by: Tomasz Jozwiak 
---
 doc/guides/tools/comp_perf.rst | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/doc/guides/tools/comp_perf.rst b/doc/guides/tools/comp_perf.rst
index 52869c1..71eef18 100644
--- a/doc/guides/tools/comp_perf.rst
+++ b/doc/guides/tools/comp_perf.rst
@@ -6,7 +6,9 @@ dpdk-test-compress-perf Tool
 
 The ``dpdk-test-compress-perf`` tool is a Data Plane Development Kit (DPDK)
 utility that allows measuring performance parameters of PMDs available in the
-compress tree. The tool reads the data from a file (--input-file),
+compress tree. User can use multiple cores to run tests on but only
+one type of compression PMD can be measured during single application
+execution. The tool reads the data from a file (--input-file),
 dumps all the file into a buffer and fills out the data of input mbufs,
 which are passed to compress device with compression operations.
 Then, the output buffers are fed into the decompression stage, and the 
resulting
@@ -26,9 +28,35 @@ Limitations
 
 * Stateful operation is not supported in this version.
 
+EAL Options
+~~~
+
+The following are the EAL command-line options that can be used in conjunction
+with the ``dpdk-test-compress-perf`` application.
+See the DPDK Getting Started Guides for more information on these options.
+
+*   ``-c `` or ``-l ``
+
+   Set the hexadecimal bitmask of the cores to run on. The corelist is a
+   list cores to use.
+
+.. Note::
+
+   One lcore is needed for process admin, tests are run on all other cores.
+   To run tests on two lcores, three lcores must be passed to the tool.
+
+*   ``-w ``
+
+   Add a PCI device in white list.
+
+*   ``--vdev ``
+
+   Add a virtual device.
+
+Appication Options
+~~
 
-Command line options
-
+ ``--ptest [benchmark/verify]``: set test type (default: benchmark)
 
  ``--driver-name NAME``: compress driver to use
 
-- 
2.7.4



Re: [dpdk-dev] [PATCH 00/25] Make shared memory config non-public

2019-05-30 Thread Burakov, Anatoly

On 29-May-19 9:11 PM, David Marchand wrote:

On Wed, May 29, 2019 at 6:31 PM Anatoly Burakov 
wrote:


This patchset removes the shared memory config from public
API, and replaces all usages of said config with new API
calls.

The patchset is mostly a search-and-replace job and should
be pretty easy to review. However, the changes to ENA



I went and did the same job with some scripts.

Not sure you really need to split in all those patches.
We are not going to backport this.


The "separate commits" thing is made for the benefit of reviewers, not 
backporters. In my experience it's much easier to get a maintainer to 
review a smaller patch than it is to look through a wall of irrelevant 
changes.


That said, for trivial changes such as these, maybe this is indeed 
unnecessary.



Some changes are mixed, the kni changes are in the hash: patch.


Oops, will fix, thanks for pointing it out!




I spotted a missed qlock in :
lib/librte_eal/common/eal_common_tailqs.c:
  rte_rwlock_read_lock(&mcfg->qlock);
lib/librte_eal/common/eal_common_tailqs.c:
  rte_rwlock_read_unlock(&mcfg->qlock);


On the names of the functions, could we have something shorter ?
The prefix rte_eal_mcfg_ is not necessary from my pov.


I can drop the mcfg, but IMO all of these locking functions should be 
kept under one namespace, and rte_eal_ is too broad.





driver are of particular interest, because they're using

the shared memory config in a way that i find confusing.



I thought the same when I looked at it before.
Hopefully the ena maintainers will enlight us :-).


I tried to implement the equivalent changes as well as

i could, but since the code doesn't make any sense to me,
i would really like to request help from ENA maintainers.

Everything else should be pretty straightforward.



We are missing the deprecation notice removal at the end of the series and
a note in the release notes.


Will add. Making into V1 deadline was higher priority :D



Thanks Anatoly!





--
Thanks,
Anatoly


[dpdk-dev] [PATCH v1 6/7] app/test-compress-perf: add force process termination

2019-05-30 Thread Tomasz Jozwiak
This patch adds a possibility to force controlled process termination
as a result of two signals: SIGTERM and SIGINT

Signed-off-by: Tomasz Jozwiak 
---
 app/test-compress-perf/comp_perf_options.h|  1 +
 app/test-compress-perf/comp_perf_test_benchmark.c | 13 
 app/test-compress-perf/comp_perf_test_verify.c| 14 
 app/test-compress-perf/main.c | 26 +--
 4 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/app/test-compress-perf/comp_perf_options.h 
b/app/test-compress-perf/comp_perf_options.h
index 79e63d5..534212d 100644
--- a/app/test-compress-perf/comp_perf_options.h
+++ b/app/test-compress-perf/comp_perf_options.h
@@ -68,6 +68,7 @@ struct comp_test_data {
 
double ratio;
enum cleanup_st cleanup;
+   int perf_comp_force_stop;
 };
 
 int
diff --git a/app/test-compress-perf/comp_perf_test_benchmark.c 
b/app/test-compress-perf/comp_perf_test_benchmark.c
index 9b0b146..b38b33c 100644
--- a/app/test-compress-perf/comp_perf_test_benchmark.c
+++ b/app/test-compress-perf/comp_perf_test_benchmark.c
@@ -183,6 +183,9 @@ main_loop(struct cperf_benchmark_ctx *ctx, enum 
rte_comp_xform_type type)
ops[op_id]->private_xform = priv_xform;
}
 
+   if (unlikely(test_data->perf_comp_force_stop))
+   goto end;
+
num_enq = rte_compressdev_enqueue_burst(dev_id,
mem->qp_id, ops,
num_ops);
@@ -241,6 +244,9 @@ main_loop(struct cperf_benchmark_ctx *ctx, enum 
rte_comp_xform_type type)
 
/* Dequeue the last operations */
while (total_deq_ops < total_ops) {
+   if (unlikely(test_data->perf_comp_force_stop))
+   goto end;
+
num_deq = rte_compressdev_dequeue_burst(dev_id,
   mem->qp_id,
   deq_ops,
@@ -305,6 +311,13 @@ main_loop(struct cperf_benchmark_ctx *ctx, enum 
rte_comp_xform_type type)
rte_mempool_put_bulk(mem->op_pool, (void **)ops, allocated);
rte_compressdev_private_xform_free(dev_id, priv_xform);
rte_free(ops);
+
+   if (test_data->perf_comp_force_stop) {
+   RTE_LOG(ERR, USER1,
+ "lcore: %d Perf. test has been aborted by user\n",
+   mem->lcore_id);
+   res = -1;
+   }
return res;
 }
 
diff --git a/app/test-compress-perf/comp_perf_test_verify.c 
b/app/test-compress-perf/comp_perf_test_verify.c
index c2aab70..b2cd7a0 100644
--- a/app/test-compress-perf/comp_perf_test_verify.c
+++ b/app/test-compress-perf/comp_perf_test_verify.c
@@ -187,6 +187,9 @@ main_loop(struct cperf_verify_ctx *ctx, enum 
rte_comp_xform_type type)
ops[op_id]->private_xform = priv_xform;
}
 
+   if (unlikely(test_data->perf_comp_force_stop))
+   goto end;
+
num_enq = rte_compressdev_enqueue_burst(dev_id,
mem->qp_id, ops,
num_ops);
@@ -267,6 +270,9 @@ main_loop(struct cperf_verify_ctx *ctx, enum 
rte_comp_xform_type type)
 
/* Dequeue the last operations */
while (total_deq_ops < total_ops) {
+   if (unlikely(test_data->perf_comp_force_stop))
+   goto end;
+
num_deq = rte_compressdev_dequeue_burst(dev_id,
mem->qp_id,
deq_ops,
@@ -345,6 +351,14 @@ main_loop(struct cperf_verify_ctx *ctx, enum 
rte_comp_xform_type type)
rte_mempool_put_bulk(mem->op_pool, (void **)ops, allocated);
rte_compressdev_private_xform_free(dev_id, priv_xform);
rte_free(ops);
+
+   if (test_data->perf_comp_force_stop) {
+   RTE_LOG(ERR, USER1,
+ "lcore: %d Perf. test has been aborted by user\n",
+   mem->lcore_id);
+   res = -1;
+   }
+
return res;
 }
 
diff --git a/app/test-compress-perf/main.c b/app/test-compress-perf/main.c
index c8be84e..98acd02 100644
--- a/app/test-compress-perf/main.c
+++ b/app/test-compress-perf/main.c
@@ -2,6 +2,10 @@
  * Copyright(c) 2018 Intel Corporation
  */
 
+#include 
+#include 
+#include 
+
 #include 
 #include 
 #include 
@@ -42,6 +46,8 @@ static const struct cperf_test cperf_testmap[] = {
}
 };
 
+static struct comp_test_data *test_data;
+
 static int
 comp_perf_check_capabilities(struct comp_test_data *test

[dpdk-dev] [PATCH v1 7/7] doc: update release notes for 19.08

2019-05-30 Thread Tomasz Jozwiak
Added release note entry for test-compress-perf application

Signed-off-by: Tomasz Jozwiak 
---
 doc/guides/rel_notes/release_19_08.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/rel_notes/release_19_08.rst 
b/doc/guides/rel_notes/release_19_08.rst
index b9510f9..543e7d3 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -54,6 +54,9 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+* **Updated test-compress-perf tool application.**
+
+  Added multiple cores feature to compression perf tool application.
 
 Removed Items
 -
-- 
2.7.4



Re: [dpdk-dev] 18.11.2 (LTS) patches review and test

2019-05-30 Thread Ian Stokes

On 5/21/2019 3:01 PM, Kevin Traynor wrote:

Hi all,

Here is a list of patches targeted for LTS release 18.11.2.

The planned date for the final release is 11th June.

Please help with testing and validation of your use cases and report
any issues/results. For the final release I will update the release
notes with fixes and reported validations.

A release candidate tarball can be found at:

 https://dpdk.org/browse/dpdk-stable/tag/?id=v18.11.2-rc1

These patches are located at branch 18.11 of dpdk-stable repo:
 https://dpdk.org/browse/dpdk-stable/

Thanks.

Kevin Traynor



Hi Kevin,

I've validated with current head OVS Master and OVS 2.11.1 with VSPERF. 
Tested with i40e (X710), i40eVF, ixgbe (82599ES), ixgbeVF, igb(I350) and 
igbVF devices.


Following tests were conducted and passed:

* vswitch_p2p_tput: vSwitch - configure switch and execute RFC2544 
throughput test.
* vswitch_p2p_cont: vSwitch - configure switch and execute RFC2544 
continuous stream test.
* vswitch_pvp_tput: vSwitch - configure switch, vnf and execute RFC2544 
throughput test.
* vswitch_pvp_cont: vSwitch - configure switch, vnf and execute RFC2544 
continuous stream test.
* ovsdpdk_hotplug_attach: Ensure successful port-add after binding a 
device to igb_uio after ovs-vswitchd is launched.

* ovsdpdk_mq_p2p_rxqs: Setup rxqs on NIC port.
* ovsdpdk_mq_pvp_rxqs: Setup rxqs on vhost user port.
* ovsdpdk_mq_pvp_rxqs_linux_bridge: Confirm traffic received over vhost 
RXQs with Linux virtio device in guest.
* ovsdpdk_mq_pvp_rxqs_testpmd: Confirm traffic received over vhost RXQs 
with DPDK device in guest.

* ovsdpdk_vhostuser_client: Test vhost-user client mode.
* ovsdpdk_vhostuser_client_reconnect: Test vhost-user client mode 
reconnect feature.

* ovsdpdk_vhostuser_server: Test vhost-user server mode.
* ovsdpdk_vhostuser_sock_dir: Verify functionality of vhost-sock-dir flag.
* ovsdpdk_vdev_add_null_pmd: Test addition of port using the null DPDK 
PMD driver.
* ovsdpdk_vdev_del_null_pmd: Test deletion of port using the null DPDK 
PMD driver.
* ovsdpdk_vdev_add_af_packet_pmd: Test addition of port using the 
af_packet DPDK PMD driver.
* ovsdpdk_vdev_del_af_packet_pmd: Test deletion of port using the 
af_packet DPDK PMD driver.
* ovsdpdk_numa: Test vhost-user NUMA support. Vhostuser PMD threads 
should migrate to the same numa slot, where QEMU is executed.
* ovsdpdk_jumbo_p2p: Ensure that jumbo frames are received, processed 
and forwarded correctly by DPDK physical ports.
* ovsdpdk_jumbo_pvp: Ensure that jumbo frames are received, processed 
and forwarded correctly by DPDK vhost-user ports.
* ovsdpdk_jumbo_p2p_upper_bound: Ensure that jumbo frames above the 
configured Rx port's MTU are not accepted.
* ovsdpdk_jumbo_mtu_upper_bound_vport: Verify that the upper bound limit 
is enforced for OvS DPDK vhost-user ports.
* ovsdpdk_rate_p2p: Ensure when a user creates a rate limiting physical 
interface that the traffic is limited to the specified policer rate in a 
p2p setup.
* ovsdpdk_rate_pvp: Ensure when a user creates a rate limiting vHost 
User interface that the traffic is limited to the specified policer rate 
in a pvp setup.
* ovsdpdk_qos_p2p: In a p2p setup, ensure when a QoS egress policer is 
created that the traffic is limited to the specified rate.
* ovsdpdk_qos_pvp: In a pvp setup, ensure when a QoS egress policer is 
created that the traffic is limited to the specified rate.

* phy2phy_scalability: LTD.Scalability.Flows.RFC2544.0PacketLoss
* phy2phy_scalability_cont: Phy2Phy Scalability Continuous Stream
* pvp_cont: PVP Continuous Stream
* pvvp_cont: PVVP Continuous Stream
* pvpv_cont: Two VMs in parallel with Continuous Stream

Regards
Ian


[dpdk-dev] [PATCH v2 0/3] add more features for AF_XDP pmd

2019-05-30 Thread Xiaolong Ye
This patch series mainly includes 3 new features for AF_XDP pmd. They
are separated independent features, the reason I take them in one
patchset is that they have code dependency.

1. zero copy

This patch enables `zero copy` between af_xdp umem and mbuf by using
external mbuf mechanism.

2. multi-queue

With mutli-queue support, one AF_XDP pmd instance can use multi netdev
queues.

3. busy-poll

With busy-poll, all processing occurs on a single core, performance is
better from a per-core perspective.

This patch has dependency on busy-poll support in kernel side and now it
is in
RFC stage [1].

[1] https://www.spinics.net/lists/netdev/msg568337.html

V2 changes:

1. improve mutli-queue support by getting the ethtool channel, so the
   driver is able to get a reason maximum queue number.
2. add a tiny cleanup patch to get rid of unused struct member
3. remove the busy-poll patch as its kernel counterpart changes, will
   update the patch later

Xiaolong Ye (3):
  net/af_xdp: enable zero copy by extbuf
  net/af_xdp: add multi-queue support
  net/af_xdp: remove unused struct member

 doc/guides/nics/af_xdp.rst  |   4 +-
 drivers/net/af_xdp/rte_eth_af_xdp.c | 285 
 2 files changed, 213 insertions(+), 76 deletions(-)

-- 
2.17.1



[dpdk-dev] [PATCH v2 1/3] net/af_xdp: enable zero copy by extbuf

2019-05-30 Thread Xiaolong Ye
Implement zero copy of af_xdp pmd through mbuf's external memory
mechanism to achieve high performance.

This patch also provides a new parameter "pmd_zero_copy" for user, so they
can choose to enable zero copy of af_xdp pmd or not.

To be clear, "zero copy" here is different from the "zero copy mode" of
AF_XDP, it is about zero copy between af_xdp umem and mbuf used in dpdk
application.

Suggested-by: Varghese Vipin 
Suggested-by: Tummala Sivaprasad 
Suggested-by: Olivier Matz 
Signed-off-by: Xiaolong Ye 
---
 doc/guides/nics/af_xdp.rst  |   1 +
 drivers/net/af_xdp/rte_eth_af_xdp.c | 104 +---
 2 files changed, 79 insertions(+), 26 deletions(-)

diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 175038153..0bd4239fe 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -28,6 +28,7 @@ The following options can be provided to set up an af_xdp 
port in DPDK.
 
 *   ``iface`` - name of the Kernel interface to attach to (required);
 *   ``queue`` - netdev queue id (optional, default 0);
+*   ``pmd_zero_copy`` - enable zero copy or not (optional, default 0);
 
 Prerequisites
 -
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c 
b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 35c72272c..014cd5691 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -70,6 +70,7 @@ struct xsk_umem_info {
struct xsk_umem *umem;
struct rte_ring *buf_ring;
const struct rte_memzone *mz;
+   int pmd_zc;
 };
 
 struct rx_stats {
@@ -109,8 +110,8 @@ struct pmd_internals {
int if_index;
char if_name[IFNAMSIZ];
uint16_t queue_idx;
+   int pmd_zc;
struct ether_addr eth_addr;
-   struct xsk_umem_info *umem;
struct rte_mempool *mb_pool_share;
 
struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
@@ -119,10 +120,12 @@ struct pmd_internals {
 
 #define ETH_AF_XDP_IFACE_ARG   "iface"
 #define ETH_AF_XDP_QUEUE_IDX_ARG   "queue"
+#define ETH_AF_XDP_PMD_ZC_ARG  "pmd_zero_copy"
 
 static const char * const valid_arguments[] = {
ETH_AF_XDP_IFACE_ARG,
ETH_AF_XDP_QUEUE_IDX_ARG,
+   ETH_AF_XDP_PMD_ZC_ARG,
NULL
 };
 
@@ -166,6 +169,15 @@ reserve_fill_queue(struct xsk_umem_info *umem, uint16_t 
reserve_size)
return 0;
 }
 
+static void
+umem_buf_release_to_fq(void *addr, void *opaque)
+{
+   struct xsk_umem_info *umem = (struct xsk_umem_info *)opaque;
+   uint64_t umem_addr = (uint64_t)addr - umem->mz->addr_64;
+
+   rte_ring_enqueue(umem->buf_ring, (void *)umem_addr);
+}
+
 static uint16_t
 eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 {
@@ -175,6 +187,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t 
nb_pkts)
struct xsk_ring_prod *fq = &umem->fq;
uint32_t idx_rx = 0;
uint32_t free_thresh = fq->size >> 1;
+   int pmd_zc = umem->pmd_zc;
struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE];
unsigned long dropped = 0;
unsigned long rx_bytes = 0;
@@ -197,19 +210,29 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
uint64_t addr;
uint32_t len;
void *pkt;
+   uint16_t buf_len = ETH_AF_XDP_FRAME_SIZE;
+   struct rte_mbuf_ext_shared_info *shinfo;
 
desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
addr = desc->addr;
len = desc->len;
pkt = xsk_umem__get_data(rxq->umem->mz->addr, addr);
 
-   rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
+   if (pmd_zc) {
+   shinfo = rte_pktmbuf_ext_shinfo_init_helper(pkt,
+   &buf_len, umem_buf_release_to_fq, umem);
+
+   rte_pktmbuf_attach_extbuf(mbufs[i], pkt, 0, buf_len,
+ shinfo);
+   } else {
+   rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *),
+   pkt, len);
+   rte_ring_enqueue(umem->buf_ring, (void *)addr);
+   }
rte_pktmbuf_pkt_len(mbufs[i]) = len;
rte_pktmbuf_data_len(mbufs[i]) = len;
rx_bytes += len;
bufs[i] = mbufs[i];
-
-   rte_ring_enqueue(umem->buf_ring, (void *)addr);
}
 
xsk_ring_cons__release(rx, rcvd);
@@ -262,12 +285,21 @@ kick_tx(struct pkt_tx_queue *txq)
pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
 }
 
+static inline bool
+in_umem_range(struct xsk_umem_info *umem, uint64_t addr)
+{
+   uint64_t mz_base_addr = umem->mz->addr_64;
+
+   return addr >= mz_base_addr && addr < mz_base_addr + umem->mz->len;
+}
+
 static uint16_t
 eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)

[dpdk-dev] [PATCH v2 2/3] net/af_xdp: add multi-queue support

2019-05-30 Thread Xiaolong Ye
This patch adds two parameters `start_queue` and `queue_count` to
specify the range of netdev queues used by AF_XDP pmd.

Signed-off-by: Xiaolong Ye 
---
 doc/guides/nics/af_xdp.rst  |   3 +-
 drivers/net/af_xdp/rte_eth_af_xdp.c | 194 
 2 files changed, 141 insertions(+), 56 deletions(-)

diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 0bd4239fe..18defcda3 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -27,7 +27,8 @@ Options
 The following options can be provided to set up an af_xdp port in DPDK.
 
 *   ``iface`` - name of the Kernel interface to attach to (required);
-*   ``queue`` - netdev queue id (optional, default 0);
+*   ``start_queue`` - starting netdev queue id (optional, default 0);
+*   ``queue_count`` - total netdev queue number (optional, default 1);
 *   ``pmd_zero_copy`` - enable zero copy or not (optional, default 0);
 
 Prerequisites
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c 
b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 014cd5691..f56aabcae 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -12,6 +12,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "af_xdp_deps.h"
 #include 
 
@@ -57,12 +59,12 @@ static int af_xdp_logtype;
 #define ETH_AF_XDP_NUM_BUFFERS 4096
 #define ETH_AF_XDP_DATA_HEADROOM   0
 #define ETH_AF_XDP_DFLT_NUM_DESCS  XSK_RING_CONS__DEFAULT_NUM_DESCS
-#define ETH_AF_XDP_DFLT_QUEUE_IDX  0
+#define ETH_AF_XDP_DFLT_START_QUEUE_IDX0
+#define ETH_AF_XDP_DFLT_QUEUE_COUNT1
 
 #define ETH_AF_XDP_RX_BATCH_SIZE   32
 #define ETH_AF_XDP_TX_BATCH_SIZE   32
 
-#define ETH_AF_XDP_MAX_QUEUE_PAIRS 16
 
 struct xsk_umem_info {
struct xsk_ring_prod fq;
@@ -88,7 +90,7 @@ struct pkt_rx_queue {
struct rx_stats stats;
 
struct pkt_tx_queue *pair;
-   uint16_t queue_idx;
+   int xsk_queue_idx;
 };
 
 struct tx_stats {
@@ -103,28 +105,34 @@ struct pkt_tx_queue {
struct tx_stats stats;
 
struct pkt_rx_queue *pair;
-   uint16_t queue_idx;
+   int xsk_queue_idx;
 };
 
 struct pmd_internals {
int if_index;
char if_name[IFNAMSIZ];
-   uint16_t queue_idx;
+   int start_queue_idx;
+   int queue_cnt;
+   int max_queue_cnt;
+   int combined_queue_cnt;
+
int pmd_zc;
struct ether_addr eth_addr;
struct rte_mempool *mb_pool_share;
 
-   struct pkt_rx_queue rx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
-   struct pkt_tx_queue tx_queues[ETH_AF_XDP_MAX_QUEUE_PAIRS];
+   struct pkt_rx_queue *rx_queues;
+   struct pkt_tx_queue *tx_queues;
 };
 
 #define ETH_AF_XDP_IFACE_ARG   "iface"
-#define ETH_AF_XDP_QUEUE_IDX_ARG   "queue"
+#define ETH_AF_XDP_START_QUEUE_ARG "start_queue"
+#define ETH_AF_XDP_QUEUE_COUNT_ARG "queue_count"
 #define ETH_AF_XDP_PMD_ZC_ARG  "pmd_zero_copy"
 
 static const char * const valid_arguments[] = {
ETH_AF_XDP_IFACE_ARG,
-   ETH_AF_XDP_QUEUE_IDX_ARG,
+   ETH_AF_XDP_START_QUEUE_ARG,
+   ETH_AF_XDP_QUEUE_COUNT_ARG,
ETH_AF_XDP_PMD_ZC_ARG,
NULL
 };
@@ -394,8 +402,8 @@ eth_dev_info(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->if_index = internals->if_index;
dev_info->max_mac_addrs = 1;
dev_info->max_rx_pktlen = ETH_FRAME_LEN;
-   dev_info->max_rx_queues = 1;
-   dev_info->max_tx_queues = 1;
+   dev_info->max_rx_queues = internals->queue_cnt;
+   dev_info->max_tx_queues = internals->queue_cnt;
 
dev_info->min_mtu = ETHER_MIN_MTU;
dev_info->max_mtu = ETH_AF_XDP_FRAME_SIZE - ETH_AF_XDP_DATA_HEADROOM;
@@ -412,21 +420,23 @@ eth_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
struct pmd_internals *internals = dev->data->dev_private;
struct xdp_statistics xdp_stats;
struct pkt_rx_queue *rxq;
+   struct pkt_tx_queue *txq;
socklen_t optlen;
int i, ret;
 
for (i = 0; i < dev->data->nb_rx_queues; i++) {
optlen = sizeof(struct xdp_statistics);
rxq = &internals->rx_queues[i];
-   stats->q_ipackets[i] = internals->rx_queues[i].stats.rx_pkts;
-   stats->q_ibytes[i] = internals->rx_queues[i].stats.rx_bytes;
+   txq = rxq->pair;
+   stats->q_ipackets[i] = rxq->stats.rx_pkts;
+   stats->q_ibytes[i] = rxq->stats.rx_bytes;
 
-   stats->q_opackets[i] = internals->tx_queues[i].stats.tx_pkts;
-   stats->q_obytes[i] = internals->tx_queues[i].stats.tx_bytes;
+   stats->q_opackets[i] = txq->stats.tx_pkts;
+   stats->q_obytes[i] = txq->stats.tx_bytes;
 
stats->ipackets += stats->q_ipackets[i];
stats->ibytes += stats->q_ibytes[i];
-   stats->imissed += internals->rx_queue

[dpdk-dev] [PATCH v2 3/3] net/af_xdp: remove unused struct member

2019-05-30 Thread Xiaolong Ye
Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
Cc: sta...@dpdk.org

Signed-off-by: Xiaolong Ye 
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c 
b/drivers/net/af_xdp/rte_eth_af_xdp.c
index f56aabcae..fc25d245b 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -118,7 +118,6 @@ struct pmd_internals {
 
int pmd_zc;
struct ether_addr eth_addr;
-   struct rte_mempool *mb_pool_share;
 
struct pkt_rx_queue *rx_queues;
struct pkt_tx_queue *tx_queues;
-- 
2.17.1



Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable

2019-05-30 Thread Ilya Maximets
On 29.05.2019 23:15, Michael Santana Francisco wrote:
> On 5/29/19 12:39 PM, Ilya Maximets wrote:
>> The first thing many developers do before start building DPDK is
>> disabling all the not needed divers and libraries. This happens
>> just because more than a half of DPDK dirvers and libraries are not
>> needed for the particular reason. For example, you don't need
>> dpaa*, octeon*, various croypto devices, eventdev, etc. if you're
>> only want to build OVS for x86_64 with static linking.
>>
>> By disabling everything you don't need, build speeds up literally 10x
>> times. This is important for CI systems. For example, TravisCI wastes
>> 10 minutes for the default DPDK build just to check linking with OVS.
>>
>> Another thing is the binary size. Number of DPDK libraries and,
>> as a result, size of resulted statically linked application decreases
>> significantly.
>>
>> Important thing also that you're able to not install some dependencies
>> if you don't have them on a target platform. Just disable libs/drivers
>> that depends on it. Similar thing for the glibc version mismatch
>> between build and target platforms.
>>
>> Also, I have to note that less code means less probability of
>> failures and less number of attack vectors.
>>
>> This patch gives 'meson' the power of configurability that we
>> have with 'make'. Using new options it's possible to enable just
>> what you need and nothing more.
>>
>> For example, following cmdline could be used to build almost minimal
>> set of DPDK libs and drivers to check OVS build:
>>
>>   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false \
>> -Ddrivers_bus=pci,vdev  \
>> -Ddrivers_mempool=ring  \
>> -Ddrivers_net=null,virtio,ring  \
>> -Ddrivers_crypto=virtio \
>> -Ddrivers_compress=none \
>> -Ddrivers_event=none\
>> -Ddrivers_baseband=none \
>> -Ddrivers_raw=none  \
>> -Ddrivers_common=none   \
>> -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\
>>ethdev,pci,hash,cryptodev,pdump,vhost \
>> -Dapps=none
>>
>> Adding a few real net drivers will give configuration that can be used
>> in production environment.
>>
>> Looks not very pretty, but this could be moved to a script.
>>
>> Build details:
>>
>>   Build targets in project: 57
>>
>>   $ time ninja
>>   real0m11,528s
>>   user1m4,137s
>>   sys 0m4,935s
>>
>>   $ du -sh ../dpdk_meson_install/
>>   3,5M../dpdk_meson_install/
>>
>> To compare with what we have without these options:
>>
>>   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false
>>   Build targets in project: 434
>>
>>   $ time ninja
>>   real1m38,963s
>>   user10m18,624s
>>   sys 0m45,478s
>>
>>   $ du -sh ../dpdk_meson_install/
>>   27M ../dpdk_meson_install/
>>
>> 10x speed up for the user time.
>> 7.7 times size decrease.
>>
>> This is probably not much user-friendly because it's not a Kconfig
>> and dependency tracking in meson is really poor, so it requires
>> usually few iterations to pick correct set of libraries to satisfy
>> all dependencies. However, it's not a big deal. Options intended
>> for a proficient users who knows what they need.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  app/meson.build  |  5 +
>>  drivers/baseband/meson.build |  5 +
>>  drivers/bus/meson.build  |  6 ++
>>  drivers/common/meson.build   |  6 ++
>>  drivers/compress/meson.build |  5 +
>>  drivers/crypto/meson.build   |  5 +
>>  drivers/event/meson.build|  6 ++
>>  drivers/mempool/meson.build  |  6 ++
>>  drivers/net/meson.build  |  6 ++
>>  drivers/raw/meson.build  |  6 ++
>>  lib/meson.build  |  5 +
>>  meson_options.txt| 22 ++
>>  12 files changed, 83 insertions(+)
>>
>> diff --git a/app/meson.build b/app/meson.build
>> index 2b9fdef74..48972954c 100644
>> --- a/app/meson.build
>> +++ b/app/meson.build
>> @@ -17,6 +17,11 @@ apps = [
>>  'test-pipeline',
>>  'test-pmd']
>>  
>> +enabled_apps = get_option('apps')
>> +if enabled_apps != 'all'
>> +apps = (enabled_apps == 'none') ? [] : enabled_apps.split(',')
>> +endif
>> +
>>  # for BSD only
>>  lib_execinfo = cc.find_library('execinfo', required: false)
>>  
>> diff --git a/drivers/baseband/meson.build b/drivers/baseband/meson.build
>> index 52489df35..fabc80fc2 100644
>> --- a/drivers/baseband/meson.build
>> +++ b/drivers/baseband/meson.build
>> @@ -3,5 +3,10 @@
>>  
>>  drivers = ['null']
>>  
>> +enabled_drivers = get_option('drivers_baseband')
>> +if enabled_drivers != 'all'
>> +drivers = (enabled_drivers == 'none') ? [] : enabled_drivers.split(',')
>> +endif
>> +
>>  config_flag_fmt = 'RTE_LIBRTE_@0@_PMD'
>>  driver_name_fmt = 'rte_pmd_@0@'
>> diff -

Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable

2019-05-30 Thread Ilya Maximets
On 29.05.2019 23:37, Luca Boccassi wrote:
> On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote:
>> The first thing many developers do before start building DPDK is
>> disabling all the not needed divers and libraries. This happens
>> just because more than a half of DPDK dirvers and libraries are not
>> needed for the particular reason. For example, you don't need
>> dpaa*, octeon*, various croypto devices, eventdev, etc. if you're
>> only want to build OVS for x86_64 with static linking.
>>
>> By disabling everything you don't need, build speeds up literally 10x
>> times. This is important for CI systems. For example, TravisCI wastes
>> 10 minutes for the default DPDK build just to check linking with OVS.
>>
>> Another thing is the binary size. Number of DPDK libraries and,
>> as a result, size of resulted statically linked application decreases
>> significantly.
>>
>> Important thing also that you're able to not install some
>> dependencies
>> if you don't have them on a target platform. Just disable
>> libs/drivers
>> that depends on it. Similar thing for the glibc version mismatch
>> between build and target platforms.
>>
>> Also, I have to note that less code means less probability of
>> failures and less number of attack vectors.
>>
>> This patch gives 'meson' the power of configurability that we
>> have with 'make'. Using new options it's possible to enable just
>> what you need and nothing more.
>>
>> For example, following cmdline could be used to build almost minimal
>> set of DPDK libs and drivers to check OVS build:
>>
>>   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false \
>> -Ddrivers_bus=pci,vdev  \
>> -Ddrivers_mempool=ring  \
>> -Ddrivers_net=null,virtio,ring  \
>> -Ddrivers_crypto=virtio \
>> -Ddrivers_compress=none \
>> -Ddrivers_event=none\
>> -Ddrivers_baseband=none \
>> -Ddrivers_raw=none  \
>> -Ddrivers_common=none   \
>>
>> -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\
>>ethdev,pci,hash,cryptodev,pdump,vhost \
>> -Dapps=none
>>
>> Adding a few real net drivers will give configuration that can be
>> used
>> in production environment.
>>
>> Looks not very pretty, but this could be moved to a script.
>>
>> Build details:
>>
>>   Build targets in project: 57
>>
>>   $ time ninja
>>   real0m11,528s
>>   user1m4,137s
>>   sys 0m4,935s
>>
>>   $ du -sh ../dpdk_meson_install/
>>   3,5M../dpdk_meson_install/
>>
>> To compare with what we have without these options:
>>
>>   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false
>>   Build targets in project: 434
>>
>>   $ time ninja
>>   real1m38,963s
>>   user10m18,624s
>>   sys 0m45,478s
>>
>>   $ du -sh ../dpdk_meson_install/
>>   27M ../dpdk_meson_install/
>>
>> 10x speed up for the user time.
>> 7.7 times size decrease.
>>
>> This is probably not much user-friendly because it's not a Kconfig
>> and dependency tracking in meson is really poor, so it requires
>> usually few iterations to pick correct set of libraries to satisfy
>> all dependencies. However, it's not a big deal. Options intended
>> for a proficient users who knows what they need.
> 
> Hi,
> 
> We talked about this a few times in the past, and it was actually one
> of the design goals to _avoid_ replicating the octopus-like config
> system of the makefiles. That's because it makes the test matrix
> insanely complicated, not to mention the harm to user friendliness,
> among other things.
> 
> If someone doesn't want to use a PMD, they can just avoid installing it
> - it's simple enough.

So how can I do this? I don't think 'ninja install' has such option.
Also, if you think that it is safe to skip some libs/drivers in installation
process, it must be safe to not build them at all. It's just a waste of
time and computational resources to build something known to be not used.
And if you're going to ship DPDK libraries separately in distros, you'll
have to test their different combinations anyway. If they're so independent
that you don't need to test them in various combinations, than your point
about test matrix is not valid.

> 
> Sorry, but from me it's a very strong NACK.

Sorry, but let me disagree with you. For me, meson configurability is the
essential thing to have in terms of deprecating the 'make' build system.
DPDK was and keeps being (in most cases) the library that users statically
linking to a single application built for particular platform and not using
for anything else. This means that user in most cases knows which parts
needed and which parts will never be used. Current meson build system
doesn't allow to disable anything forcing users to link with the whole bunch
of unused code.

One major case is that you have to have build

[dpdk-dev] Short term stable branches/releases

2019-05-30 Thread Kevin Traynor
Hi All,

A reminder that there is no longer a default in practice of having short
term stable branches/releases for xx.02/05/08 DPDK master releases.

Note, this is relevant for xx.02/05/08 based short term (~3 month)
stables only. DPDK LTS based off xx.11 is *not* changing.

This is to allow more time for maintenance and validation of master and
LTS branches/releases as it seems to be where the community are most
interested.

There can still be short term stable branches/releases for individual
xx.02/05/08 releases if there is a particular need and a commitment from
community members to maintain/validate.

See http://doc.dpdk.org/guides/contributing/stable.html#stable-releases
for further details.

thanks,
Kevin.


Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors

2019-05-30 Thread Bruce Richardson
On Thu, May 30, 2019 at 09:40:08AM +0200, Thomas Monjalon wrote:
> 30/05/2019 09:31, David Marchand:
> > On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger <
> > step...@networkplumber.org> wrote:
> > 
> > > On Thu, 30 May 2019 00:46:30 +0200
> > > Thomas Monjalon  wrote:
> > >
> > > > 23/05/2019 15:58, David Marchand:
> > > > > From: Stephen Hemminger 
> > > > >
> > > > > The fields of the internal EAL core configuration are currently
> > > > > laid bare as part of the API. This is not good practice and limits
> > > > > fixing issues with layout and sizes.
> > > > >
> > > > > Make new accessor functions for the fields used by current drivers
> > > > > and examples.
> > > > [...]
> > > > > +DPDK_19.08 {
> > > > > +   global:
> > > > > +
> > > > > +   rte_lcore_cpuset;
> > > > > +   rte_lcore_index;
> > > > > +   rte_lcore_to_cpu_id;
> > > > > +   rte_lcore_to_socket_id;
> > > > > +
> > > > > +} DPDK_19.05;
> > > > > +
> > > > >  EXPERIMENTAL {
> > > > > global:
> > > >
> > > > Just to make sure, are we OK to introduce these functions
> > > > as non-experimental?
> > >
> > > They were in previous releases as inlines this patch converts them
> > > to real functions.
> > >
> > >
> > Well, yes and no.
> > 
> > rte_lcore_index and rte_lcore_to_socket_id already existed, so making them
> > part of the ABI is fine for me.
> > 
> > rte_lcore_to_cpu_id is new but seems quite safe in how it can be used,
> > adding it to the ABI is ok for me.
> 
> It is used by DPAA and some test.
> I guess adding as experimental is fine too?
> I'm fine with both options, I'm just trying to apply the policy
> we agreed on. Does this case deserve an exception?
> 

While it may be a good candidate, I'm not sure how much making an exception
for it really matters. I'd be tempted to just mark it experimental and then
have it stable for the 19.11 release. What do we really lose by waiting a
release to stabilize it?



Re: [dpdk-dev] [PATCH 00/25] Make shared memory config non-public

2019-05-30 Thread Bruce Richardson
On Thu, May 30, 2019 at 09:07:44AM +0100, Burakov, Anatoly wrote:
> On 29-May-19 9:11 PM, David Marchand wrote:
> > On Wed, May 29, 2019 at 6:31 PM Anatoly Burakov 
> > wrote:
> > 
> > > This patchset removes the shared memory config from public
> > > API, and replaces all usages of said config with new API
> > > calls.
> > > 
> > > The patchset is mostly a search-and-replace job and should
> > > be pretty easy to review. However, the changes to ENA
> > > 
> > 
> > I went and did the same job with some scripts.
> > 
> > Not sure you really need to split in all those patches.
> > We are not going to backport this.
> 
> The "separate commits" thing is made for the benefit of reviewers, not
> backporters. In my experience it's much easier to get a maintainer to review
> a smaller patch than it is to look through a wall of irrelevant changes.
> 
> That said, for trivial changes such as these, maybe this is indeed
> unnecessary.
> 
> > Some changes are mixed, the kni changes are in the hash: patch.
> 
> Oops, will fix, thanks for pointing it out!
> 
> > 
> > 
> > I spotted a missed qlock in :
> > lib/librte_eal/common/eal_common_tailqs.c:
> >   rte_rwlock_read_lock(&mcfg->qlock);
> > lib/librte_eal/common/eal_common_tailqs.c:
> >   rte_rwlock_read_unlock(&mcfg->qlock);
> > 
> > 
> > On the names of the functions, could we have something shorter ?
> > The prefix rte_eal_mcfg_ is not necessary from my pov.
> 
> I can drop the mcfg, but IMO all of these locking functions should be kept
> under one namespace, and rte_eal_ is too broad.
> 

I think most/all developers are aware that memory is part of eal, so
rte_mcfg_ prefix (or rte_memcfg) might work.


[dpdk-dev] [PATCH v1 2/9] net/mlx5: add log file procedure for debug data

2019-05-30 Thread Matan Azrad
Add a global function in the PMD which dumps debug information to
specific file.

The data can be printed in hexadecimal format or as regular string.

The number of debug files per PMD entity should be limited by a new PMD
probe parameter called max_dump_files_num.

The files will be created in the /var/log directory or in the current
directory.

Cc: sta...@dpdk.org

Signed-off-by: Matan Azrad 
---
 doc/guides/nics/mlx5.rst |  7 +++
 drivers/net/mlx5/mlx5.c  |  8 
 drivers/net/mlx5/mlx5.h  |  1 +
 drivers/net/mlx5/mlx5_rxtx.c | 44 
 drivers/net/mlx5/mlx5_rxtx.h |  2 ++
 5 files changed, 62 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 325e9f6..aa89bd9 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -507,6 +507,13 @@ Run-time configuration
 
 representor=[0-2]
 
+- ``max_dump_files_num`` parameter [int]
+
+  The maximum number of files per PMD entity that may be created for debug 
information.
+  The files will be created in /var/log directory or in current directory.
+
+  set to 128 by default.
+
 Firmware configuration
 ~~
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9f5ec97..ebb49c8 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -116,6 +116,9 @@
 /* Select port representors to instantiate. */
 #define MLX5_REPRESENTOR "representor"
 
+/* Device parameter to configure the maximum number of dump files per queue. */
+#define MLX5_MAX_DUMP_FILES_NUM "max_dump_files_num"
+
 #ifndef HAVE_IBV_MLX5_MOD_MPW
 #define MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED (1 << 2)
 #define MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW (1 << 3)
@@ -926,6 +929,8 @@ struct mlx5_dev_spawn_data {
config->dv_flow_en = !!tmp;
} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
config->mr_ext_memseg_en = !!tmp;
+   } else if (strcmp(MLX5_MAX_DUMP_FILES_NUM, key) == 0) {
+   config->max_dump_files_num = tmp;
} else {
DRV_LOG(WARNING, "%s: unknown parameter", key);
rte_errno = EINVAL;
@@ -970,6 +975,7 @@ struct mlx5_dev_spawn_data {
MLX5_DV_FLOW_EN,
MLX5_MR_EXT_MEMSEG_EN,
MLX5_REPRESENTOR,
+   MLX5_MAX_DUMP_FILES_NUM,
NULL,
};
struct rte_kvargs *kvlist;
@@ -1433,6 +1439,8 @@ struct mlx5_dev_spawn_data {
DRV_LOG(WARNING, "Multi-Packet RQ isn't supported");
config.mprq.enabled = 0;
}
+   if (config.max_dump_files_num == 0)
+   config.max_dump_files_num = 128;
eth_dev = rte_eth_dev_allocate(name);
if (eth_dev == NULL) {
DRV_LOG(ERR, "can not allocate rte ethdev");
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3eaaafd..4c339d0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -204,6 +204,7 @@ struct mlx5_dev_config {
unsigned int flow_prio; /* Number of flow priorities. */
unsigned int tso_max_payload_sz; /* Maximum TCP payload for TSO. */
unsigned int ind_table_max_size; /* Maximum indirection table size. */
+   unsigned int max_dump_files_num; /* Maximum dump files per queue. */
int txq_inline; /* Maximum packet size for inlining. */
int txqs_inline; /* Queue number threshold for inlining. */
int txqs_vec; /* Queue number threshold for vectorized Tx. */
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 3da3f62..2c8d066 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -524,6 +524,50 @@
return rx_queue_count(rxq);
 }
 
+#define MLX5_SYSTEM_LOG_DIR "/var/log"
+/**
+ * Dump debug information to log file.
+ *
+ * @param fname
+ *   The file name.
+ * @param hex_title
+ *   If not NULL this string is printed as a header to the output
+ *   and the output will be in hexadecimal view.
+ * @param buf
+ *   This is the buffer address to print out.
+ * @param len
+ *   The number of bytes to dump out.
+ */
+void
+mlx5_dump_debug_information(const char *fname, const char *hex_title,
+   const void *buf, unsigned int hex_len)
+{
+   FILE *fd;
+
+   MKSTR(path, "%s/%s", MLX5_SYSTEM_LOG_DIR, fname);
+   fd = fopen(path, "a+");
+   if (!fd) {
+   DRV_LOG(WARNING, "cannot open %s for debug dump\n",
+   path);
+   MKSTR(path2, "./%s", fname);
+   fd = fopen(path2, "a+");
+   if (!fd) {
+   DRV_LOG(ERR, "cannot open %s for debug dump\n",
+   path2);
+   return;
+   }
+   DRV_LOG(INFO, "New debug dump in file %s\n", path2);
+   } else {
+   DRV_LOG(INFO, "New debug dump in file %s\n", path);
+   }
+   if (hex_title)
+ 

[dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error

2019-05-30 Thread Matan Azrad
Add support for data-path Rx and Tx completions with error handling:

1. Detect the error.
2. Do not crash.
3. Report it in statistics counters.
4. Dump debug information to system log file.
5. Recover the error under the hood.
6. Add support for secondary process recovery.

No performance impact was shown. 

Matan Azrad (9):
  net/mlx5: remove Rx queues indexes correlation
  net/mlx5: add log file procedure for debug data
  net/mlx5: fix device arguments error detection
  net/mlx5: mitigate Rx doorbell memory barrier
  net/mlx5: separate Rx queue initialization
  net/mlx5: extend Rx completion with error handling
  net/mlx5: handle Tx completion with error
  net/mlx5: recover secondary process Rx errors
  net/mlx5: recover secondary process Tx errors

 doc/guides/nics/mlx5.rst  |   7 +
 drivers/net/mlx5/mlx5.c   |  14 +-
 drivers/net/mlx5/mlx5.h   |  12 +
 drivers/net/mlx5/mlx5_mp.c|  46 +++
 drivers/net/mlx5/mlx5_prm.h   |  11 +
 drivers/net/mlx5/mlx5_rxq.c   |  42 +--
 drivers/net/mlx5/mlx5_rxtx.c  | 673 --
 drivers/net/mlx5/mlx5_rxtx.h  | 193 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c  |   5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h |  36 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h  |  36 +-
 drivers/net/mlx5/mlx5_trigger.c   |   1 +
 drivers/net/mlx5/mlx5_txq.c   |   4 +-
 13 files changed, 792 insertions(+), 288 deletions(-)

-- 
1.8.3.1



[dpdk-dev] [PATCH v1 3/9] net/mlx5: fix device arguments error detection

2019-05-30 Thread Matan Azrad
When bad device arguments are added to the DPDK command line, the PMD
ignores all the command line arguments specified by the user and uses
the default values instead.

This behavior doesn't make sense because the user intension is to force
some device parameters and expects to get an error in case of
problematic issues with the arguments.

Stop probing and report an error in case of problematic command line
arguments.

Fixes: e72dd09b614e ("net/mlx5: add support for configuration through kvargs")
Cc: sta...@dpdk.org

Signed-off-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ebb49c8..23e397e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -986,8 +986,10 @@ struct mlx5_dev_spawn_data {
return 0;
/* Following UGLY cast is done to pass checkpatch. */
kvlist = rte_kvargs_parse(devargs->args, params);
-   if (kvlist == NULL)
-   return 0;
+   if (kvlist == NULL) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
/* Process parameters. */
for (i = 0; (params[i] != NULL); ++i) {
if (rte_kvargs_count(kvlist, params[i])) {
-- 
1.8.3.1



[dpdk-dev] [PATCH v1 1/9] net/mlx5: remove Rx queues indexes correlation

2019-05-30 Thread Matan Azrad
There is a full correlation between the CQE indexes to the WQE indexes
in the vectorized Rx queues management.

When the RQ is inserted to the reset state, the correlation may break
because the HW starts the RQ polling from index 0 while the CQ polling
continues regularly.

As an arrangement to CQE errors handling, when the RQ can be reset,
the correlation dependence should be removed from all the Rx queues
index managments.

Remove the aformentioned dependence from the vectorized Rx burst
functions.

Cc: sta...@dpdk.org

Signed-off-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5_rxq.c   |  1 +
 drivers/net/mlx5/mlx5_rxtx.h  |  6 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 26 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h  | 26 +-
 4 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index a00cb12..b248f38 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1006,6 +1006,7 @@ struct mlx5_rxq_ibv *
rxq_data->cq_uar = cq_info.cq_uar;
rxq_data->cqn = cq_info.cqn;
rxq_data->cq_arm_sn = 0;
+   rxq_data->decompressed = 0;
/* Update doorbell counter. */
rxq_data->rq_ci = wqe_n >> rxq_data->sges_n;
rte_wmb();
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 4339aaf..7bacdba 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -101,11 +101,15 @@ struct mlx5_rxq_data {
uint32_t rq_pi;
uint32_t cq_ci;
uint16_t rq_repl_thresh; /* Threshold for buffer replenishment. */
+   union {
+   struct rxq_zip zip; /* Compressed context. */
+   uint16_t decompressed;
+   /* Number of ready mbufs decompressed from the CQ. */
+   };
struct mlx5_mr_ctrl mr_ctrl; /* MR control descriptor. */
uint16_t mprq_max_memcpy_len; /* Maximum size of packet to memcpy. */
volatile void *wqes;
volatile struct mlx5_cqe(*cqes)[];
-   struct rxq_zip zip; /* Compressed context. */
RTE_STD_C11
union  {
struct rte_mbuf *(*elts)[];
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h 
b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 38e915c..6a1b2bb 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -352,8 +352,11 @@
  * @param elts
  *   Pointer to SW ring to be filled. The first mbuf has to be pre-built from
  *   the title completion descriptor to be copied to the rest of mbufs.
+ *
+ * @return
+ *   Number of mini-CQEs successfully decompressed.
  */
-static inline void
+static inline uint16_t
 rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
struct rte_mbuf **elts)
 {
@@ -505,6 +508,7 @@
rxq->stats.ibytes += rcvd_byte;
 #endif
rxq->cq_ci += mcqe_n;
+   return mcqe_n;
 }
 
 /**
@@ -729,24 +733,17 @@
rte_prefetch_non_temporal(cq + 2);
rte_prefetch_non_temporal(cq + 3);
pkts_n = RTE_MIN(pkts_n, MLX5_VPMD_RX_MAX_BURST);
-   /*
-* Order of indexes:
-*   rq_ci >= cq_ci >= rq_pi
-* Definition of indexes:
-*   rq_ci - cq_ci := # of buffers owned by HW (posted).
-*   cq_ci - rq_pi := # of buffers not returned to app (decompressed).
-*   N - (rq_ci - rq_pi) := # of buffers consumed (to be replenished).
-*/
repl_n = q_n - (rxq->rq_ci - rxq->rq_pi);
if (repl_n >= rxq->rq_repl_thresh)
mlx5_rx_replenish_bulk_mbuf(rxq, repl_n);
/* See if there're unreturned mbufs from compressed CQE. */
-   rcvd_pkt = rxq->cq_ci - rxq->rq_pi;
+   rcvd_pkt = rxq->decompressed;
if (rcvd_pkt > 0) {
rcvd_pkt = RTE_MIN(rcvd_pkt, pkts_n);
rxq_copy_mbuf_v(rxq, pkts, rcvd_pkt);
rxq->rq_pi += rcvd_pkt;
pkts += rcvd_pkt;
+   rxq->decompressed -= rcvd_pkt;
}
elts_idx = rxq->rq_pi & q_mask;
elts = &(*rxq->elts)[elts_idx];
@@ -754,10 +751,11 @@
pkts_n = RTE_ALIGN_FLOOR(pkts_n - rcvd_pkt, MLX5_VPMD_DESCS_PER_LOOP);
/* Not to cross queue end. */
pkts_n = RTE_MIN(pkts_n, q_n - elts_idx);
+   pkts_n = RTE_MIN(pkts_n, q_n - cq_idx);
if (!pkts_n)
return rcvd_pkt;
/* At this point, there shouldn't be any remained packets. */
-   assert(rxq->rq_pi == rxq->cq_ci);
+   assert(rxq->decompressed == 0);
/*
 * Note that vectors have reverse order - {v3, v2, v1, v0}, because
 * there's no instruction to count trailing zeros. __builtin_clzl() is
@@ -1003,15 +1001,17 @@
/* Decompress the last CQE if compressed. */
if (comp_idx < MLX5_VPMD_DESCS_PER_LOOP && comp_idx == n) {
assert(comp_idx == (nocmp_n % MLX5_VPMD_DESCS_PER_LOOP));
-   rxq_cq_

[dpdk-dev] [PATCH v1 4/9] net/mlx5: mitigate Rx doorbell memory barrier

2019-05-30 Thread Matan Azrad
The RQ WQEs must be written in the memory before the HW gets the RQ
doorbell, hence a memory barrier should be triggered after the WQEs
writing and before the doorbell writing.

The current code used rte_wmb barrier which ensures that all the memory
stores were done while it is enough to use rte_cio_wmb barrier for the
local memory stores because the WQEs are in local memory.

CC: sta...@dpdk.org

Signed-off-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5_rxq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b248f38..282295f 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1009,7 +1009,7 @@ struct mlx5_rxq_ibv *
rxq_data->decompressed = 0;
/* Update doorbell counter. */
rxq_data->rq_ci = wqe_n >> rxq_data->sges_n;
-   rte_wmb();
+   rte_cio_wmb();
*rxq_data->rq_db = rte_cpu_to_be_32(rxq_data->rq_ci);
DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id,
idx, (void *)&tmpl);
-- 
1.8.3.1



[dpdk-dev] [PATCH v1 5/9] net/mlx5: separate Rx queue initialization

2019-05-30 Thread Matan Azrad
Move the RQ WQEs initialization code to separate function as an
arrangement to CQE error recovering for code reuse.

CC: sta...@dpdk.org

Signed-off-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5_rxq.c  | 43 ++-
 drivers/net/mlx5/mlx5_rxtx.c | 53 
 2 files changed, 55 insertions(+), 41 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 282295f..90e8c49 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -779,7 +779,6 @@ struct mlx5_rxq_ibv *
struct mlx5_rxq_ibv *tmpl;
struct mlx5dv_cq cq_info;
struct mlx5dv_rwq rwq;
-   unsigned int i;
int ret = 0;
struct mlx5dv_obj obj;
struct mlx5_dev_config *config = &priv->config;
@@ -964,53 +963,15 @@ struct mlx5_rxq_ibv *
}
/* Fill the rings. */
rxq_data->wqes = rwq.buf;
-   for (i = 0; (i != wqe_n); ++i) {
-   volatile struct mlx5_wqe_data_seg *scat;
-   uintptr_t addr;
-   uint32_t byte_count;
-
-   if (mprq_en) {
-   struct mlx5_mprq_buf *buf = (*rxq_data->mprq_bufs)[i];
-
-   scat = &((volatile struct mlx5_wqe_mprq *)
-rxq_data->wqes)[i].dseg;
-   addr = (uintptr_t)mlx5_mprq_buf_addr(buf);
-   byte_count = (1 << rxq_data->strd_sz_n) *
-(1 << rxq_data->strd_num_n);
-   } else {
-   struct rte_mbuf *buf = (*rxq_data->elts)[i];
-
-   scat = &((volatile struct mlx5_wqe_data_seg *)
-rxq_data->wqes)[i];
-   addr = rte_pktmbuf_mtod(buf, uintptr_t);
-   byte_count = DATA_LEN(buf);
-   }
-   /* scat->addr must be able to store a pointer. */
-   assert(sizeof(scat->addr) >= sizeof(uintptr_t));
-   *scat = (struct mlx5_wqe_data_seg){
-   .addr = rte_cpu_to_be_64(addr),
-   .byte_count = rte_cpu_to_be_32(byte_count),
-   .lkey = mlx5_rx_addr2mr(rxq_data, addr),
-   };
-   }
rxq_data->rq_db = rwq.dbrec;
rxq_data->cqe_n = log2above(cq_info.cqe_cnt);
-   rxq_data->cq_ci = 0;
-   rxq_data->consumed_strd = 0;
-   rxq_data->rq_pi = 0;
-   rxq_data->zip = (struct rxq_zip){
-   .ai = 0,
-   };
rxq_data->cq_db = cq_info.dbrec;
rxq_data->cqes = (volatile struct mlx5_cqe (*)[])(uintptr_t)cq_info.buf;
rxq_data->cq_uar = cq_info.cq_uar;
rxq_data->cqn = cq_info.cqn;
rxq_data->cq_arm_sn = 0;
-   rxq_data->decompressed = 0;
-   /* Update doorbell counter. */
-   rxq_data->rq_ci = wqe_n >> rxq_data->sges_n;
-   rte_cio_wmb();
-   *rxq_data->rq_db = rte_cpu_to_be_32(rxq_data->rq_ci);
+   mlx5_rxq_initialize(rxq_data);
+   rxq_data->cq_ci = 0;
DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id,
idx, (void *)&tmpl);
rte_atomic32_inc(&tmpl->refcnt);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 2c8d066..aec0185 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1831,6 +1831,59 @@
 }
 
 /**
+ * Initialize Rx WQ and indexes.
+ *
+ * @param[in] rxq
+ *   Pointer to RX queue structure.
+ */
+void
+mlx5_rxq_initialize(struct mlx5_rxq_data *rxq)
+{
+   const unsigned int wqe_n = 1 << rxq->elts_n;
+   unsigned int i;
+
+   for (i = 0; (i != wqe_n); ++i) {
+   volatile struct mlx5_wqe_data_seg *scat;
+   uintptr_t addr;
+   uint32_t byte_count;
+
+   if (mlx5_rxq_mprq_enabled(rxq)) {
+   struct mlx5_mprq_buf *buf = (*rxq->mprq_bufs)[i];
+
+   scat = &((volatile struct mlx5_wqe_mprq *)
+   rxq->wqes)[i].dseg;
+   addr = (uintptr_t)mlx5_mprq_buf_addr(buf);
+   byte_count = (1 << rxq->strd_sz_n) *
+   (1 << rxq->strd_num_n);
+   } else {
+   struct rte_mbuf *buf = (*rxq->elts)[i];
+
+   scat = &((volatile struct mlx5_wqe_data_seg *)
+   rxq->wqes)[i];
+   addr = rte_pktmbuf_mtod(buf, uintptr_t);
+   byte_count = DATA_LEN(buf);
+   }
+   /* scat->addr must be able to store a pointer. */
+   assert(sizeof(scat->addr) >= sizeof(uintptr_t));
+   *scat = (struct mlx5_wqe_data_seg){
+   .addr = rte_cpu_to_be_64(addr),
+   .byte_count = rte_cpu_to_be_32(byte_count),
+   .lkey = mlx5_rx_addr2mr(rxq, addr),
+ 

[dpdk-dev] [PATCH v1 8/9] net/mlx5: recover secondary process Rx errors

2019-05-30 Thread Matan Azrad
The RQ errors recovery mechanism in the PMD invokes a Verbs functions to
modify the RQ states in order to reset the RQ and to reactivate it.

These Verbs functions are not allowed to be invoked from a secondary
process, hence the PMD skips the recovery when the error is captured by
secondary processes queues.

Using the DPDK IPC mechanism the secondary process can request Verbs
queues state modifications to be done synchronically by the primary
process.

Add support for secondary process Rx errors recovery.

Cc: sta...@dpdk.org

Signed-off-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5.h | 11 +
 drivers/net/mlx5/mlx5_mp.c  | 46 +++
 drivers/net/mlx5/mlx5_rxtx.c| 98 +
 drivers/net/mlx5/mlx5_rxtx.h|  3 ++
 drivers/net/mlx5/mlx5_trigger.c |  1 +
 5 files changed, 141 insertions(+), 18 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4c339d0..85a6d02 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -61,6 +61,13 @@ enum mlx5_mp_req_type {
MLX5_MP_REQ_CREATE_MR,
MLX5_MP_REQ_START_RXTX,
MLX5_MP_REQ_STOP_RXTX,
+   MLX5_MP_REQ_QUEUE_STATE_MODIFY,
+};
+
+struct mlx5_mp_arg_queue_state_modify {
+   uint8_t is_wq; /* Set if WQ. */
+   uint16_t queue_id; /* DPDK queue ID. */
+   enum ibv_wq_state state; /* WQ requested state. */
 };
 
 /* Pameters for IPC. */
@@ -71,6 +78,8 @@ struct mlx5_mp_param {
RTE_STD_C11
union {
uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */
+   struct mlx5_mp_arg_queue_state_modify state_modify;
+   /* MLX5_MP_REQ_QUEUE_STATE_MODIFY */
} args;
 };
 
@@ -542,6 +551,8 @@ int mlx5_ctrl_flow(struct rte_eth_dev *dev,
 void mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev);
 int mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
 int mlx5_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
+int mlx5_mp_req_queue_state_modify(struct rte_eth_dev *dev,
+  struct mlx5_mp_arg_queue_state_modify *sm);
 void mlx5_mp_init_primary(void);
 void mlx5_mp_uninit_primary(void);
 void mlx5_mp_init_secondary(void);
diff --git a/drivers/net/mlx5/mlx5_mp.c b/drivers/net/mlx5/mlx5_mp.c
index cea74ad..3ccae51 100644
--- a/drivers/net/mlx5/mlx5_mp.c
+++ b/drivers/net/mlx5/mlx5_mp.c
@@ -85,6 +85,12 @@
res->result = 0;
ret = rte_mp_reply(&mp_res, peer);
break;
+   case MLX5_MP_REQ_QUEUE_STATE_MODIFY:
+   mp_init_msg(dev, &mp_res, param->type);
+   res->result = mlx5_queue_state_modify_primary
+   (dev, ¶m->args.state_modify);
+   ret = rte_mp_reply(&mp_res, peer);
+   break;
default:
rte_errno = EINVAL;
DRV_LOG(ERR, "port %u invalid mp request type",
@@ -271,6 +277,46 @@
 }
 
 /**
+ * Request Verbs queue state modification to the primary process.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet structure.
+ * @param sm
+ *   State modify parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_mp_req_queue_state_modify(struct rte_eth_dev *dev,
+  struct mlx5_mp_arg_queue_state_modify *sm)
+{
+   struct rte_mp_msg mp_req;
+   struct rte_mp_msg *mp_res;
+   struct rte_mp_reply mp_rep;
+   struct mlx5_mp_param *req = (struct mlx5_mp_param *)mp_req.param;
+   struct mlx5_mp_param *res;
+   struct timespec ts = {.tv_sec = MLX5_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
+   int ret;
+
+   assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+   mp_init_msg(dev, &mp_req, MLX5_MP_REQ_QUEUE_STATE_MODIFY);
+   req->args.state_modify = *sm;
+   ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+   if (ret) {
+   DRV_LOG(ERR, "port %u request to primary process failed",
+   dev->data->port_id);
+   return -rte_errno;
+   }
+   assert(mp_rep.nb_received == 1);
+   mp_res = &mp_rep.msgs[0];
+   res = (struct mlx5_mp_param *)mp_res->param;
+   ret = res->result;
+   free(mp_rep.msgs);
+   return ret;
+}
+
+/**
  * Request Verbs command file descriptor for mmap to the primary process.
  *
  * @param[in] dev
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 36e2dd3..cb3baad 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -2031,6 +2031,75 @@
 }
 
 /**
+ * Modify a Verbs queue state.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param sm
+ *   State modify request parameters.
+ *
+ * @return
+ *   0 in case of success else non-zero value and rte_errno is set.
+ */
+int
+mlx5_queue_state_modify_primary(struct rte_eth_dev *dev,
+   const struct mlx5_mp_arg_queue_state

[dpdk-dev] [PATCH v1 6/9] net/mlx5: extend Rx completion with error handling

2019-05-30 Thread Matan Azrad
When WQEs are posted to the HW to receive packets, the PMD may receive
a completion report with error from the HW, aka error CQE which is
associated to a bad WQE.

The error reason may be bad address, wrong lkey, small buffer size,
etc. that can wrongly be configured by the PMD or by the user.

Checking all the optional mistakes to prevent error CQEs doesn't make
sense due to performance impacts, moreover, some error CQEs can be
triggered because of the packets coming from the wire when the DPDK
application has no any control.

Most of the error CQE types change the RQ state to error state what
causes all the next received packets to be dropped by the HW and to be
completed with CQE flush error forever.

The current solution detects these error CQEs and even reports the
errors to the user by the statistics error counters but without
recovery, so if the RQ inserted to the error state it never moves to
ready state again and all the next packets ever will be dropped.

Extend the error CQEs handling for recovery by moving the state to
ready again, and rearranging all the RQ WQEs and the management
variables appropriately.

Sometimes the error CQE root cause is very hard to debug and even may
be related to some corner cases which are not reproducible easily,
hence a dump file with debug information will be created for the first
number of error CQEs, this number can be configured by the PMD probe
parameters.

Cc: sta...@dpdk.org

Signed-off-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5_rxtx.c | 328 +++
 drivers/net/mlx5/mlx5_rxtx.h | 101 
 drivers/net/mlx5/mlx5_rxtx_vec.c |   5 +-
 3 files changed, 266 insertions(+), 168 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index aec0185..5369fc1 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mlx5.h"
 #include "mlx5_utils.h"
@@ -444,7 +445,7 @@
cq_ci = rxq->cq_ci;
}
cqe = &(*rxq->cqes)[cq_ci & cqe_cnt];
-   while (check_cqe(cqe, cqe_n, cq_ci) == 0) {
+   while (check_cqe(cqe, cqe_n, cq_ci) != MLX5_CQE_STATUS_HW_OWN) {
int8_t op_own;
unsigned int n;
 
@@ -1884,6 +1885,130 @@
 }
 
 /**
+ * Handle a Rx error.
+ * The function inserts the RQ state to reset when the first error CQE is
+ * shown, then drains the CQ by the caller function loop. When the CQ is empty,
+ * it moves the RQ state to ready and initializes the RQ.
+ * Next CQE identification and error counting are in the caller responsibility.
+ *
+ * @param[in] rxq
+ *   Pointer to RX queue structure.
+ * @param[in] mbuf_prepare
+ *   Whether to prepare mbufs for the RQ.
+ *
+ * @return
+ *   -1 in case of recovery error, otherwise the CQE status.
+ */
+int
+mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t mbuf_prepare)
+{
+   const uint16_t cqe_n = 1 << rxq->cqe_n;
+   const uint16_t cqe_mask = cqe_n - 1;
+   const unsigned int wqe_n = 1 << rxq->elts_n;
+   struct mlx5_rxq_ctrl *rxq_ctrl =
+   container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+   struct ibv_wq_attr mod = {
+   .attr_mask = IBV_WQ_ATTR_STATE,
+   };
+   union {
+   volatile struct mlx5_cqe *cqe;
+   volatile struct mlx5_err_cqe *err_cqe;
+   } u = {
+   .cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask],
+   };
+   int ret;
+
+   switch (rxq->err_state) {
+   case MLX5_RXQ_ERR_STATE_NO_ERROR:
+   rxq->err_state = MLX5_RXQ_ERR_STATE_NEED_RESET;
+   /* Fall-through */
+   case MLX5_RXQ_ERR_STATE_NEED_RESET:
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return -1;
+   mod.wq_state = IBV_WQS_RESET;
+   ret = mlx5_glue->modify_wq(rxq_ctrl->ibv->wq, &mod);
+   if (ret) {
+   DRV_LOG(ERR, "Cannot change Rx WQ state to RESET %s\n",
+   strerror(errno));
+   return -1;
+   }
+   if (rxq_ctrl->dump_file_n <
+   rxq_ctrl->priv->config.max_dump_files_num) {
+   MKSTR(err_str, "Unexpected CQE error syndrome "
+ "0x%02x CQN = %u RQN = %u wqe_counter = %u"
+ " rq_ci = %u cq_ci = %u", u.err_cqe->syndrome,
+ rxq->cqn, rxq_ctrl->ibv->wq->wq_num,
+ rte_be_to_cpu_16(u.err_cqe->wqe_counter),
+ rxq->rq_ci << rxq->sges_n, rxq->cq_ci);
+   MKSTR(name, "dpdk_mlx5_port_%u_rxq_%u_%u",
+ rxq->port_id, rxq->idx, (uint32_t)rte_rdtsc());
+   mlx5_dump_debug_information(name, NULL, err_str, 0);
+   mlx5_dump_debug_information(name, "MLX5 Error CQ:",
+  

[dpdk-dev] [PATCH v1 9/9] net/mlx5: recover secondary process Tx errors

2019-05-30 Thread Matan Azrad
The SQ errors recovery mechanism in the PMD invokes a Verbs
functions to modify the RQ states in order to reset the SQ and to
reactivate it.

These Verbs functions are not allowed to be invoked from a secondary
process, hence the PMD skips the recovery when the error is captured
by secondary processes queues.

Using the DPDK IPC mechanism the secondary process can request Verbs
queues state modifications to be done synchronically by the primary
process.

Add support for secondary process Tx errors recovery.

Cc: sta...@dpdk.org

Signed-off-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5_rxtx.c | 104 ++-
 1 file changed, 62 insertions(+), 42 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index cb3baad..9659478 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -51,6 +51,10 @@
 static __rte_always_inline void
 mprq_buf_replace(struct mlx5_rxq_data *rxq, uint16_t rq_idx);
 
+static int
+mlx5_queue_state_modify(struct rte_eth_dev *dev,
+   struct mlx5_mp_arg_queue_state_modify *sm);
+
 uint32_t mlx5_ptype_table[] __rte_cache_aligned = {
[0xff] = RTE_PTYPE_ALL_MASK, /* Last entry for errored packet. */
 };
@@ -570,52 +574,27 @@
 }
 
 /**
- * Move QP from error state to running state.
+ * Move QP from error state to running state and initialize indexes.
  *
- * @param txq
- *   Pointer to TX queue structure.
- * @param qp
- *   The qp pointer for recovery.
+ * @param txq_ctrl
+ *   Pointer to TX queue control structure.
  *
  * @return
- *   0 on success, else errno value.
+ *   0 on success, else -1.
  */
 static int
-tx_recover_qp(struct mlx5_txq_data *txq, struct ibv_qp *qp)
+tx_recover_qp(struct mlx5_txq_ctrl *txq_ctrl)
 {
-   int ret;
-   struct ibv_qp_attr mod = {
-   .qp_state = IBV_QPS_RESET,
-   .port_num = 1,
-   };
-   ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
-   if (ret) {
-   DRV_LOG(ERR, "Cannot change the Tx QP state to RESET %d\n",
-   ret);
-   return ret;
-   }
-   mod.qp_state = IBV_QPS_INIT;
-   ret = mlx5_glue->modify_qp(qp, &mod,
-  (IBV_QP_STATE | IBV_QP_PORT));
-   if (ret) {
-   DRV_LOG(ERR, "Cannot change Tx QP state to INIT %d\n", ret);
-   return ret;
-   }
-   mod.qp_state = IBV_QPS_RTR;
-   ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
-   if (ret) {
-   DRV_LOG(ERR, "Cannot change Tx QP state to RTR %d\n", ret);
-   return ret;
-   }
-   mod.qp_state = IBV_QPS_RTS;
-   ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
-   if (ret) {
-   DRV_LOG(ERR, "Cannot change Tx QP state to RTS %d\n", ret);
-   return ret;
-   }
-   txq->wqe_ci = 0;
-   txq->wqe_pi = 0;
-   txq->elts_comp = 0;
+   struct mlx5_mp_arg_queue_state_modify sm = {
+   .is_wq = 0,
+   .queue_id = txq_ctrl->txq.idx,
+   };
+
+   if (mlx5_queue_state_modify(ETH_DEV(txq_ctrl->priv), &sm))
+   return -1;
+   txq_ctrl->txq.wqe_ci = 0;
+   txq_ctrl->txq.wqe_pi = 0;
+   txq_ctrl->txq.elts_comp = 0;
return 0;
 }
 
@@ -690,8 +669,7 @@
 */
txq->stats.oerrors += ((txq->wqe_ci & wqe_m) -
new_wqe_pi) & wqe_m;
-   if ((rte_eal_process_type() == RTE_PROC_PRIMARY) &&
-   tx_recover_qp(txq, txq_ctrl->ibv->qp) == 0) {
+   if (tx_recover_qp(txq_ctrl) == 0) {
txq->cq_ci++;
/* Release all the remaining buffers. */
return txq->elts_head;
@@ -2065,6 +2043,48 @@
rte_errno = errno;
return ret;
}
+   } else {
+   struct mlx5_txq_data *txq = (*priv->txqs)[sm->queue_id];
+   struct mlx5_txq_ctrl *txq_ctrl =
+   container_of(txq, struct mlx5_txq_ctrl, txq);
+   struct ibv_qp_attr mod = {
+   .qp_state = IBV_QPS_RESET,
+   .port_num = (uint8_t)priv->ibv_port,
+   };
+   struct ibv_qp *qp = txq_ctrl->ibv->qp;
+
+   ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+   if (ret) {
+   DRV_LOG(ERR, "Cannot change the Tx QP state to RESET "
+   "%s\n", strerror(errno));
+   rte_errno = errno;
+   return ret;
+   }
+   mod.qp_state = IBV_QPS_INIT;
+   ret = mlx5_glue->modify_qp(qp, &mod,
+  (IBV_QP_STATE | IBV_QP_PORT));
+  

[dpdk-dev] [PATCH v1 7/9] net/mlx5: handle Tx completion with error

2019-05-30 Thread Matan Azrad
When WQEs are posted to the HW to send packets, the PMD may get a
completion report with error from the HW, aka error CQE which is
associated to a bad WQE.

The error reason may be bad address, wrong lkey, bad sizes, etc.
that can wrongly be configured by the PMD or by the user.

Checking all the optional mistakes to prevent error CQEs doesn't make
sense due to performance impacts and huge complexity.

The error CQEs change the SQ state to error state what causes all the
next posted WQEs to be completed with CQE flush error forever.

Currently, the PMD doesn't handle Tx error CQEs and even may crashed
when one of them appears.

Extend the Tx data-path to detect these error CQEs, to report them by
the statistics error counters, to recover the SQ by moving the state
to ready again and adjusting the management variables appropriately.

Sometimes the error CQE root cause is very hard to debug and even may
be related to some corner cases which are not reproducible easily, hence
a dump file with debug information will be created for the first number
of error CQEs, this number can be configured by the PMD probe
parameters.

Cc: sta...@dpdk.org

Signed-off-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5_prm.h   |  11 +++
 drivers/net/mlx5/mlx5_rxtx.c  | 166 --
 drivers/net/mlx5/mlx5_rxtx.h  |  81 ++---
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h |  10 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h  |  10 +-
 drivers/net/mlx5/mlx5_txq.c   |   4 +-
 6 files changed, 231 insertions(+), 51 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 8c42380..22db86b 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -153,6 +153,17 @@
 /* Maximum number of DS in WQE. */
 #define MLX5_DSEG_MAX 63
 
+/* The completion mode offset in the WQE control segment line 2. */
+#define MLX5_COMP_MODE_OFFSET 2
+
+/* Completion mode. */
+enum mlx5_completion_mode {
+   MLX5_COMP_ONLY_ERR = 0x0,
+   MLX5_COMP_ONLY_FIRST_ERR = 0x1,
+   MLX5_COMP_ALWAYS = 0x2,
+   MLX5_COMP_CQE_AND_EQE = 0x3,
+};
+
 /* Subset of struct mlx5_wqe_eth_seg. */
 struct mlx5_wqe_eth_seg_small {
uint32_t rsvd0;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5369fc1..36e2dd3 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -570,6 +570,141 @@
 }
 
 /**
+ * Move QP from error state to running state.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param qp
+ *   The qp pointer for recovery.
+ *
+ * @return
+ *   0 on success, else errno value.
+ */
+static int
+tx_recover_qp(struct mlx5_txq_data *txq, struct ibv_qp *qp)
+{
+   int ret;
+   struct ibv_qp_attr mod = {
+   .qp_state = IBV_QPS_RESET,
+   .port_num = 1,
+   };
+   ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+   if (ret) {
+   DRV_LOG(ERR, "Cannot change the Tx QP state to RESET %d\n",
+   ret);
+   return ret;
+   }
+   mod.qp_state = IBV_QPS_INIT;
+   ret = mlx5_glue->modify_qp(qp, &mod,
+  (IBV_QP_STATE | IBV_QP_PORT));
+   if (ret) {
+   DRV_LOG(ERR, "Cannot change Tx QP state to INIT %d\n", ret);
+   return ret;
+   }
+   mod.qp_state = IBV_QPS_RTR;
+   ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+   if (ret) {
+   DRV_LOG(ERR, "Cannot change Tx QP state to RTR %d\n", ret);
+   return ret;
+   }
+   mod.qp_state = IBV_QPS_RTS;
+   ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+   if (ret) {
+   DRV_LOG(ERR, "Cannot change Tx QP state to RTS %d\n", ret);
+   return ret;
+   }
+   txq->wqe_ci = 0;
+   txq->wqe_pi = 0;
+   txq->elts_comp = 0;
+   return 0;
+}
+
+/* Return 1 if the error CQE is signed otherwise, sign it and return 0. */
+static int
+check_err_cqe_seen(volatile struct mlx5_err_cqe *err_cqe)
+{
+   static const uint8_t magic[] = "seen";
+   int ret = 1;
+   unsigned int i;
+
+   for (i = 0; i < sizeof(magic); ++i)
+   if (!ret || err_cqe->rsvd1[i] != magic[i]) {
+   ret = 0;
+   err_cqe->rsvd1[i] = magic[i];
+   }
+   return ret;
+}
+
+/**
+ * Handle error CQE.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param error_cqe
+ *   Pointer to the error CQE.
+ *
+ * @return
+ *   The last Tx buffer element to free.
+ */
+uint16_t
+mlx5_tx_error_cqe_handle(struct mlx5_txq_data *txq,
+volatile struct mlx5_err_cqe *err_cqe)
+{
+   if (err_cqe->syndrome != MLX5_CQE_SYNDROME_WR_FLUSH_ERR) {
+   const uint16_t wqe_m = ((1 << txq->wqe_n) - 1);
+   struct mlx5_txq_ctrl *txq_ctrl =
+

Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable

2019-05-30 Thread Bruce Richardson
On Wed, May 29, 2019 at 09:37:20PM +0100, Luca Boccassi wrote:
> On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote:
> > The first thing many developers do before start building DPDK is
> > disabling all the not needed divers and libraries. This happens
> > just because more than a half of DPDK dirvers and libraries are not
> > needed for the particular reason. For example, you don't need
> > dpaa*, octeon*, various croypto devices, eventdev, etc. if you're
> > only want to build OVS for x86_64 with static linking.
> > 
> > By disabling everything you don't need, build speeds up literally 10x
> > times. This is important for CI systems. For example, TravisCI wastes
> > 10 minutes for the default DPDK build just to check linking with OVS.
> > 
> > Another thing is the binary size. Number of DPDK libraries and,
> > as a result, size of resulted statically linked application decreases
> > significantly.
> > 
> > Important thing also that you're able to not install some
> > dependencies
> > if you don't have them on a target platform. Just disable
> > libs/drivers
> > that depends on it. Similar thing for the glibc version mismatch
> > between build and target platforms.
> > 
> > Also, I have to note that less code means less probability of
> > failures and less number of attack vectors.
> > 
> > This patch gives 'meson' the power of configurability that we
> > have with 'make'. Using new options it's possible to enable just
> > what you need and nothing more.
> > 
> > For example, following cmdline could be used to build almost minimal
> > set of DPDK libs and drivers to check OVS build:
> > 
> >   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false \
> > -Ddrivers_bus=pci,vdev  \
> > -Ddrivers_mempool=ring  \
> > -Ddrivers_net=null,virtio,ring  \
> > -Ddrivers_crypto=virtio \
> > -Ddrivers_compress=none \
> > -Ddrivers_event=none\
> > -Ddrivers_baseband=none \
> > -Ddrivers_raw=none  \
> > -Ddrivers_common=none   \
> >
> > -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\
> >ethdev,pci,hash,cryptodev,pdump,vhost \
> > -Dapps=none
> > 
> > Adding a few real net drivers will give configuration that can be
> > used
> > in production environment.
> > 
> > Looks not very pretty, but this could be moved to a script.
> > 
> > Build details:
> > 
> >   Build targets in project: 57
> > 
> >   $ time ninja
> >   real0m11,528s
> >   user1m4,137s
> >   sys 0m4,935s
> > 
> >   $ du -sh ../dpdk_meson_install/
> >   3,5M../dpdk_meson_install/
> > 
> > To compare with what we have without these options:
> > 
> >   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false
> >   Build targets in project: 434
> > 
> >   $ time ninja
> >   real1m38,963s
> >   user10m18,624s
> >   sys 0m45,478s
> > 
> >   $ du -sh ../dpdk_meson_install/
> >   27M ../dpdk_meson_install/
> > 
> > 10x speed up for the user time.
> > 7.7 times size decrease.
> > 
> > This is probably not much user-friendly because it's not a Kconfig
> > and dependency tracking in meson is really poor, so it requires
> > usually few iterations to pick correct set of libraries to satisfy
> > all dependencies. However, it's not a big deal. Options intended
> > for a proficient users who knows what they need.
> 
> Hi,
> 
> We talked about this a few times in the past, and it was actually one
> of the design goals to _avoid_ replicating the octopus-like config
> system of the makefiles. That's because it makes the test matrix
> insanely complicated, not to mention the harm to user friendliness,
> among other things.
> 
> If someone doesn't want to use a PMD, they can just avoid installing it
> - it's simple enough.
> 
> Sorry, but from me it's a very strong NACK.
> 
I would agree with this position - tracking the dependencies of the
libraries etc. is a nightmare, and requires lots of ifdef'ery in the code
for handling cases where libraries don't exist.

However, I might be ok with limiting the drivers somewhat, since they don't
tend to depend on each other so much, though ideally I'd still prefer to
have one build of DPDK that has minimal configuration. If we say that we
can disable some drivers, though,  issue then becomes whether e.g. the bus
drivers could selectively be disabled, and the knock-on effects of that.
I'd hate to see the case where we end up having the meson.build files for
drivers becoming a massive list of conditional checks for a bunch of
internal dependencies. If someone is wanting to do a custom build of DPDK,
they can always patch out the subdirectories they don't want in the
meson.build files - but because of testing matrixes for such
configurations, I don't think its something we want to explicitly support.

/Bruce


Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable

2019-05-30 Thread Luca Boccassi
On Thu, 2019-05-30 at 13:03 +0300, Ilya Maximets wrote:
> On 29.05.2019 23:37, Luca Boccassi wrote:
> > On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote:
> > > The first thing many developers do before start building DPDK is
> > > disabling all the not needed divers and libraries. This happens
> > > just because more than a half of DPDK dirvers and libraries are
> > > not
> > > needed for the particular reason. For example, you don't need
> > > dpaa*, octeon*, various croypto devices, eventdev, etc. if you're
> > > only want to build OVS for x86_64 with static linking.
> > > 
> > > By disabling everything you don't need, build speeds up literally
> > > 10x
> > > times. This is important for CI systems. For example, TravisCI
> > > wastes
> > > 10 minutes for the default DPDK build just to check linking with
> > > OVS.
> > > 
> > > Another thing is the binary size. Number of DPDK libraries and,
> > > as a result, size of resulted statically linked application
> > > decreases
> > > significantly.
> > > 
> > > Important thing also that you're able to not install some
> > > dependencies
> > > if you don't have them on a target platform. Just disable
> > > libs/drivers
> > > that depends on it. Similar thing for the glibc version mismatch
> > > between build and target platforms.
> > > 
> > > Also, I have to note that less code means less probability of
> > > failures and less number of attack vectors.
> > > 
> > > This patch gives 'meson' the power of configurability that we
> > > have with 'make'. Using new options it's possible to enable just
> > > what you need and nothing more.
> > > 
> > > For example, following cmdline could be used to build almost
> > > minimal
> > > set of DPDK libs and drivers to check OVS build:
> > > 
> > >   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false
> > > \
> > > -Ddrivers_bus=pci,vdev  \
> > > -Ddrivers_mempool=ring  \
> > > -Ddrivers_net=null,virtio,ring  \
> > > -Ddrivers_crypto=virtio \
> > > -Ddrivers_compress=none \
> > > -Ddrivers_event=none\
> > > -Ddrivers_baseband=none \
> > > -Ddrivers_raw=none  \
> > > -Ddrivers_common=none   \
> > >
> > > -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\
> > >ethdev,pci,hash,cryptodev,pdump,vhost \
> > > -Dapps=none
> > > 
> > > Adding a few real net drivers will give configuration that can be
> > > used
> > > in production environment.
> > > 
> > > Looks not very pretty, but this could be moved to a script.
> > > 
> > > Build details:
> > > 
> > >   Build targets in project: 57
> > > 
> > >   $ time ninja
> > >   real0m11,528s
> > >   user1m4,137s
> > >   sys 0m4,935s
> > > 
> > >   $ du -sh ../dpdk_meson_install/
> > >   3,5M../dpdk_meson_install/
> > > 
> > > To compare with what we have without these options:
> > > 
> > >   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false
> > >   Build targets in project: 434
> > > 
> > >   $ time ninja
> > >   real1m38,963s
> > >   user10m18,624s
> > >   sys 0m45,478s
> > > 
> > >   $ du -sh ../dpdk_meson_install/
> > >   27M ../dpdk_meson_install/
> > > 
> > > 10x speed up for the user time.
> > > 7.7 times size decrease.
> > > 
> > > This is probably not much user-friendly because it's not a
> > > Kconfig
> > > and dependency tracking in meson is really poor, so it requires
> > > usually few iterations to pick correct set of libraries to
> > > satisfy
> > > all dependencies. However, it's not a big deal. Options intended
> > > for a proficient users who knows what they need.
> > 
> > Hi,
> > 
> > We talked about this a few times in the past, and it was actually
> > one
> > of the design goals to _avoid_ replicating the octopus-like config
> > system of the makefiles. That's because it makes the test matrix
> > insanely complicated, not to mention the harm to user friendliness,
> > among other things.
> > 
> > If someone doesn't want to use a PMD, they can just avoid
> > installing it
> > - it's simple enough.
> 
> So how can I do this? I don't think 'ninja install' has such option.
> Also, if you think that it is safe to skip some libs/drivers in
> installation
> process, it must be safe to not build them at all. It's just a waste
> of
> time and computational resources to build something known to be not
> used.
> And if you're going to ship DPDK libraries separately in distros,
> you'll
> have to test their different combinations anyway. If they're so
> independent
> that you don't need to test them in various combinations, than your
> point
> about test matrix is not valid.

It can be done in the packaging step, or post-install if there's no
packaging. An operating system vendor is free to do its own test and
support plan, and decide to leave out som

Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable

2019-05-30 Thread Ilya Maximets
On 30.05.2019 13:22, Bruce Richardson wrote:
> On Wed, May 29, 2019 at 09:37:20PM +0100, Luca Boccassi wrote:
>> On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote:
>>> The first thing many developers do before start building DPDK is
>>> disabling all the not needed divers and libraries. This happens
>>> just because more than a half of DPDK dirvers and libraries are not
>>> needed for the particular reason. For example, you don't need
>>> dpaa*, octeon*, various croypto devices, eventdev, etc. if you're
>>> only want to build OVS for x86_64 with static linking.
>>>
>>> By disabling everything you don't need, build speeds up literally 10x
>>> times. This is important for CI systems. For example, TravisCI wastes
>>> 10 minutes for the default DPDK build just to check linking with OVS.
>>>
>>> Another thing is the binary size. Number of DPDK libraries and,
>>> as a result, size of resulted statically linked application decreases
>>> significantly.
>>>
>>> Important thing also that you're able to not install some
>>> dependencies
>>> if you don't have them on a target platform. Just disable
>>> libs/drivers
>>> that depends on it. Similar thing for the glibc version mismatch
>>> between build and target platforms.
>>>
>>> Also, I have to note that less code means less probability of
>>> failures and less number of attack vectors.
>>>
>>> This patch gives 'meson' the power of configurability that we
>>> have with 'make'. Using new options it's possible to enable just
>>> what you need and nothing more.
>>>
>>> For example, following cmdline could be used to build almost minimal
>>> set of DPDK libs and drivers to check OVS build:
>>>
>>>   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false \
>>> -Ddrivers_bus=pci,vdev  \
>>> -Ddrivers_mempool=ring  \
>>> -Ddrivers_net=null,virtio,ring  \
>>> -Ddrivers_crypto=virtio \
>>> -Ddrivers_compress=none \
>>> -Ddrivers_event=none\
>>> -Ddrivers_baseband=none \
>>> -Ddrivers_raw=none  \
>>> -Ddrivers_common=none   \
>>>
>>> -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\
>>>ethdev,pci,hash,cryptodev,pdump,vhost \
>>> -Dapps=none
>>>
>>> Adding a few real net drivers will give configuration that can be
>>> used
>>> in production environment.
>>>
>>> Looks not very pretty, but this could be moved to a script.
>>>
>>> Build details:
>>>
>>>   Build targets in project: 57
>>>
>>>   $ time ninja
>>>   real0m11,528s
>>>   user1m4,137s
>>>   sys 0m4,935s
>>>
>>>   $ du -sh ../dpdk_meson_install/
>>>   3,5M../dpdk_meson_install/
>>>
>>> To compare with what we have without these options:
>>>
>>>   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false
>>>   Build targets in project: 434
>>>
>>>   $ time ninja
>>>   real1m38,963s
>>>   user10m18,624s
>>>   sys 0m45,478s
>>>
>>>   $ du -sh ../dpdk_meson_install/
>>>   27M ../dpdk_meson_install/
>>>
>>> 10x speed up for the user time.
>>> 7.7 times size decrease.
>>>
>>> This is probably not much user-friendly because it's not a Kconfig
>>> and dependency tracking in meson is really poor, so it requires
>>> usually few iterations to pick correct set of libraries to satisfy
>>> all dependencies. However, it's not a big deal. Options intended
>>> for a proficient users who knows what they need.
>>
>> Hi,
>>
>> We talked about this a few times in the past, and it was actually one
>> of the design goals to _avoid_ replicating the octopus-like config
>> system of the makefiles. That's because it makes the test matrix
>> insanely complicated, not to mention the harm to user friendliness,
>> among other things.
>>
>> If someone doesn't want to use a PMD, they can just avoid installing it
>> - it's simple enough.
>>
>> Sorry, but from me it's a very strong NACK.
>>
> I would agree with this position - tracking the dependencies of the
> libraries etc. is a nightmare, and requires lots of ifdef'ery in the code
> for handling cases where libraries don't exist.
> 
> However, I might be ok with limiting the drivers somewhat, since they don't
> tend to depend on each other so much, though ideally I'd still prefer to
> have one build of DPDK that has minimal configuration. If we say that we
> can disable some drivers, though,  issue then becomes whether e.g. the bus
> drivers could selectively be disabled, and the knock-on effects of that.
> I'd hate to see the case where we end up having the meson.build files for
> drivers becoming a massive list of conditional checks for a bunch of
> internal dependencies. If someone is wanting to do a custom build of DPDK,
> they can always patch out the subdirectories they don't want in the
> meson.build files - but because of testing matrixes for such
> configurations, I don't 

Re: [dpdk-dev] [PATCH 1/2] meson: don't check dependencies for tests if not required

2019-05-30 Thread Bruce Richardson
On Wed, May 29, 2019 at 07:39:57PM +0300, Ilya Maximets wrote:
> Don't need to check dependencies if test apps will not be built anyway.
> 
> Signed-off-by: Ilya Maximets 
> ---
>  app/test/meson.build | 38 +++---
>  1 file changed, 19 insertions(+), 19 deletions(-)
> 
Agree with the idea.

Would this work as a shorter alternative placed at the top of the file?

if not get_option('tests')
subdir_done()
endif

/Bruce


Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable

2019-05-30 Thread Ilya Maximets
On 30.05.2019 14:06, Luca Boccassi wrote:
> On Thu, 2019-05-30 at 13:03 +0300, Ilya Maximets wrote:
>> On 29.05.2019 23:37, Luca Boccassi wrote:
>>> On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote:
 The first thing many developers do before start building DPDK is
 disabling all the not needed divers and libraries. This happens
 just because more than a half of DPDK dirvers and libraries are
 not
 needed for the particular reason. For example, you don't need
 dpaa*, octeon*, various croypto devices, eventdev, etc. if you're
 only want to build OVS for x86_64 with static linking.

 By disabling everything you don't need, build speeds up literally
 10x
 times. This is important for CI systems. For example, TravisCI
 wastes
 10 minutes for the default DPDK build just to check linking with
 OVS.

 Another thing is the binary size. Number of DPDK libraries and,
 as a result, size of resulted statically linked application
 decreases
 significantly.

 Important thing also that you're able to not install some
 dependencies
 if you don't have them on a target platform. Just disable
 libs/drivers
 that depends on it. Similar thing for the glibc version mismatch
 between build and target platforms.

 Also, I have to note that less code means less probability of
 failures and less number of attack vectors.

 This patch gives 'meson' the power of configurability that we
 have with 'make'. Using new options it's possible to enable just
 what you need and nothing more.

 For example, following cmdline could be used to build almost
 minimal
 set of DPDK libs and drivers to check OVS build:

   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false
 \
 -Ddrivers_bus=pci,vdev  \
 -Ddrivers_mempool=ring  \
 -Ddrivers_net=null,virtio,ring  \
 -Ddrivers_crypto=virtio \
 -Ddrivers_compress=none \
 -Ddrivers_event=none\
 -Ddrivers_baseband=none \
 -Ddrivers_raw=none  \
 -Ddrivers_common=none   \

 -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\
ethdev,pci,hash,cryptodev,pdump,vhost \
 -Dapps=none

 Adding a few real net drivers will give configuration that can be
 used
 in production environment.

 Looks not very pretty, but this could be moved to a script.

 Build details:

   Build targets in project: 57

   $ time ninja
   real0m11,528s
   user1m4,137s
   sys 0m4,935s

   $ du -sh ../dpdk_meson_install/
   3,5M../dpdk_meson_install/

 To compare with what we have without these options:

   $ meson build -Dexamples='' -Dtests=false -Denable_kmods=false
   Build targets in project: 434

   $ time ninja
   real1m38,963s
   user10m18,624s
   sys 0m45,478s

   $ du -sh ../dpdk_meson_install/
   27M ../dpdk_meson_install/

 10x speed up for the user time.
 7.7 times size decrease.

 This is probably not much user-friendly because it's not a
 Kconfig
 and dependency tracking in meson is really poor, so it requires
 usually few iterations to pick correct set of libraries to
 satisfy
 all dependencies. However, it's not a big deal. Options intended
 for a proficient users who knows what they need.
>>>
>>> Hi,
>>>
>>> We talked about this a few times in the past, and it was actually
>>> one
>>> of the design goals to _avoid_ replicating the octopus-like config
>>> system of the makefiles. That's because it makes the test matrix
>>> insanely complicated, not to mention the harm to user friendliness,
>>> among other things.
>>>
>>> If someone doesn't want to use a PMD, they can just avoid
>>> installing it
>>> - it's simple enough.
>>
>> So how can I do this? I don't think 'ninja install' has such option.
>> Also, if you think that it is safe to skip some libs/drivers in
>> installation
>> process, it must be safe to not build them at all. It's just a waste
>> of
>> time and computational resources to build something known to be not
>> used.
>> And if you're going to ship DPDK libraries separately in distros,
>> you'll
>> have to test their different combinations anyway. If they're so
>> independent
>> that you don't need to test them in various combinations, than your
>> point
>> about test matrix is not valid.
> 
> It can be done in the packaging step, or post-install if there's no
> packaging. An operating system vendor is free to do its own test and
> support plan, and decide to leave out some PMDs from it.

This technically means doing this

Re: [dpdk-dev] 18.11.2 (LTS) patches review and test

2019-05-30 Thread Ali Alnubani
> -Original Message-
> From: Ian Stokes 
> Sent: Thursday, May 30, 2019 11:16 AM
> To: Kevin Traynor ; dpdk stable 
> Cc: dev@dpdk.org; Sitong Liu ; Pei Zhang
> ; Raslan Darawsheh ;
> qian.q...@intel.com; Ju-Hyoung Lee ; Ali Alnubani
> ; David Christensen ;
> benjamin.wal...@intel.com; Thomas Monjalon ;
> John McNamara ; Luca Boccassi
> ; Jerin Jacob Kollanukkaran ;
> Hemant Agrawal ; Akhil Goyal
> 
> Subject: Re: 18.11.2 (LTS) patches review and test
> 
> On 5/21/2019 3:01 PM, Kevin Traynor wrote:
> > Hi all,
> >
> > Here is a list of patches targeted for LTS release 18.11.2.
> >
> > The planned date for the final release is 11th June.
> >
> > Please help with testing and validation of your use cases and report
> > any issues/results. For the final release I will update the release
> > notes with fixes and reported validations.



> >
> > Thanks.
> >
> > Kevin Traynor
> >
> 
> Hi Kevin,
> 
> I've validated with current head OVS Master and OVS 2.11.1 with VSPERF.
> Tested with i40e (X710), i40eVF, ixgbe (82599ES), ixgbeVF, igb(I350) and
> igbVF devices.
> 
> Following tests were conducted and passed:
> 
> * vswitch_p2p_tput: vSwitch - configure switch and execute RFC2544
> throughput test.
> * vswitch_p2p_cont: vSwitch - configure switch and execute RFC2544
> continuous stream test.
> * vswitch_pvp_tput: vSwitch - configure switch, vnf and execute RFC2544
> throughput test.
> * vswitch_pvp_cont: vSwitch - configure switch, vnf and execute RFC2544
> continuous stream test.
> * ovsdpdk_hotplug_attach: Ensure successful port-add after binding a device
> to igb_uio after ovs-vswitchd is launched.
> * ovsdpdk_mq_p2p_rxqs: Setup rxqs on NIC port.
> * ovsdpdk_mq_pvp_rxqs: Setup rxqs on vhost user port.
> * ovsdpdk_mq_pvp_rxqs_linux_bridge: Confirm traffic received over vhost
> RXQs with Linux virtio device in guest.
> * ovsdpdk_mq_pvp_rxqs_testpmd: Confirm traffic received over vhost RXQs
> with DPDK device in guest.
> * ovsdpdk_vhostuser_client: Test vhost-user client mode.
> * ovsdpdk_vhostuser_client_reconnect: Test vhost-user client mode
> reconnect feature.
> * ovsdpdk_vhostuser_server: Test vhost-user server mode.
> * ovsdpdk_vhostuser_sock_dir: Verify functionality of vhost-sock-dir flag.
> * ovsdpdk_vdev_add_null_pmd: Test addition of port using the null DPDK
> PMD driver.
> * ovsdpdk_vdev_del_null_pmd: Test deletion of port using the null DPDK
> PMD driver.
> * ovsdpdk_vdev_add_af_packet_pmd: Test addition of port using the
> af_packet DPDK PMD driver.
> * ovsdpdk_vdev_del_af_packet_pmd: Test deletion of port using the
> af_packet DPDK PMD driver.
> * ovsdpdk_numa: Test vhost-user NUMA support. Vhostuser PMD threads
> should migrate to the same numa slot, where QEMU is executed.
> * ovsdpdk_jumbo_p2p: Ensure that jumbo frames are received, processed
> and forwarded correctly by DPDK physical ports.
> * ovsdpdk_jumbo_pvp: Ensure that jumbo frames are received, processed
> and forwarded correctly by DPDK vhost-user ports.
> * ovsdpdk_jumbo_p2p_upper_bound: Ensure that jumbo frames above the
> configured Rx port's MTU are not accepted.
> * ovsdpdk_jumbo_mtu_upper_bound_vport: Verify that the upper bound
> limit is enforced for OvS DPDK vhost-user ports.
> * ovsdpdk_rate_p2p: Ensure when a user creates a rate limiting physical
> interface that the traffic is limited to the specified policer rate in a p2p 
> setup.
> * ovsdpdk_rate_pvp: Ensure when a user creates a rate limiting vHost User
> interface that the traffic is limited to the specified policer rate in a pvp 
> setup.
> * ovsdpdk_qos_p2p: In a p2p setup, ensure when a QoS egress policer is
> created that the traffic is limited to the specified rate.
> * ovsdpdk_qos_pvp: In a pvp setup, ensure when a QoS egress policer is
> created that the traffic is limited to the specified rate.
> * phy2phy_scalability: LTD.Scalability.Flows.RFC2544.0PacketLoss
> * phy2phy_scalability_cont: Phy2Phy Scalability Continuous Stream
> * pvp_cont: PVP Continuous Stream
> * pvvp_cont: PVVP Continuous Stream
> * pvpv_cont: Two VMs in parallel with Continuous Stream
> 
> Regards
> Ian

Hi,

I validated this version and sent our testing matrix:
http://mails.dpdk.org/archives/stable/2019-May/015198.html

Thanks,
Ali


Re: [dpdk-dev] [PATCH 1/2] meson: don't check dependencies for tests if not required

2019-05-30 Thread Ilya Maximets
On 30.05.2019 14:55, Bruce Richardson wrote:
> On Wed, May 29, 2019 at 07:39:57PM +0300, Ilya Maximets wrote:
>> Don't need to check dependencies if test apps will not be built anyway.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  app/test/meson.build | 38 +++---
>>  1 file changed, 19 insertions(+), 19 deletions(-)
>>
> Agree with the idea.
> 
> Would this work as a shorter alternative placed at the top of the file?
> 
> if not get_option('tests')
>   subdir_done()
> endif

This looks good to me.
However, the resulted patch will be much larger because we'll have to
shift most of it to the left. If it's OK, I'll prepare v2 with this change.
What do you think?

Best regards, Ilya Maximets.


Re: [dpdk-dev] [PATCH 1/2] meson: don't check dependencies for tests if not required

2019-05-30 Thread Bruce Richardson
On Thu, May 30, 2019 at 03:06:17PM +0300, Ilya Maximets wrote:
> On 30.05.2019 14:55, Bruce Richardson wrote:
> > On Wed, May 29, 2019 at 07:39:57PM +0300, Ilya Maximets wrote:
> >> Don't need to check dependencies if test apps will not be built anyway.
> >>
> >> Signed-off-by: Ilya Maximets 
> >> ---
> >>  app/test/meson.build | 38 +++---
> >>  1 file changed, 19 insertions(+), 19 deletions(-)
> >>
> > Agree with the idea.
> > 
> > Would this work as a shorter alternative placed at the top of the file?
> > 
> > if not get_option('tests')
> > subdir_done()
> > endif
> 
> This looks good to me.
> However, the resulted patch will be much larger because we'll have to
> shift most of it to the left. If it's OK, I'll prepare v2 with this change.
> What do you think?
> 
Yes, there will be some left-shifting, but it should just be a single block
from lines 338-419, which is probably ok. The end result is better, I
think.


Re: [dpdk-dev] [RFC v9] /net: memory interface (memif)

2019-05-30 Thread Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)


> -Original Message-
> From: Ferruh Yigit 
> Sent: Wednesday, May 29, 2019 7:29 PM
> To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
> ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC v9] /net: memory interface (memif)

> > +
> > +.. csv-table:: **Memif configuration options**
> > +   :header: "Option", "Description", "Default", "Valid value"
> > +
> > +   "id=0", "Used to identify peer interface", "0", "uint32_t"
> > +   "role=master", "Set memif role", "slave", "master|slave"
> > +   "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
> 
> What happens is 'bsize < mbuf size'? I didn't see any check in the code but is
> there any assumption around this?
> Or any assumption that slave and master packet should be same? Or any
> other relation?
> If there is any assumption it may be good to add checks to the code and
> document here.

There is no relation between bsize and mbuf size. Memif driver will consume as 
many buffers as it needs (chaining them). 

> > +#ifndef _RTE_ETH_MEMIF_H_
> > +#define _RTE_ETH_MEMIF_H_
> > +
> > +#ifndef _GNU_SOURCE
> > +#define _GNU_SOURCE
> > +#endif /* GNU_SOURCE */
> 
> Why this was required?

_GNU_SOURCE is required by memfd_create().


[dpdk-dev] [PATCH v2] meson: don't check dependencies for tests if not required

2019-05-30 Thread Ilya Maximets
Don't need to check dependencies if test apps will not be built anyway.

Signed-off-by: Ilya Maximets 
---

Version 2:
  - 'get_option('tests')' check moved to the top.

 app/test/meson.build | 141 ++-
 1 file changed, 72 insertions(+), 69 deletions(-)

diff --git a/app/test/meson.build b/app/test/meson.build
index 83391cef0..4de856f93 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -1,6 +1,10 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
+if not get_option('tests')
+   subdir_done()
+endif
+
 test_sources = files('commands.c',
'packet_burst_generator.c',
'sample_packet_forward.c',
@@ -335,86 +339,85 @@ if get_option('default_library') == 'static'
link_libs = dpdk_drivers
 endif
 
-if get_option('tests')
-   dpdk_test = executable('dpdk-test',
-   test_sources,
-   link_whole: link_libs,
-   dependencies: test_dep_objs,
-   c_args: [cflags, '-DALLOW_EXPERIMENTAL_API'],
-   install_rpath: driver_install_path,
-   install: false)
+dpdk_test = executable('dpdk-test',
+   test_sources,
+   link_whole: link_libs,
+   dependencies: test_dep_objs,
+   c_args: [cflags, '-DALLOW_EXPERIMENTAL_API'],
+   install_rpath: driver_install_path,
+   install: false)
 
-   # some perf tests (eg: memcpy perf autotest)take very long
-   # to complete, so timeout to 10 minutes
-   timeout_seconds = 600
-   timeout_seconds_fast = 10
-
-   # Retrieve the number of CPU cores, defaulting to 4.
-   num_cores = '0-3'
-   if host_machine.system() == 'linux'
-   num_cores = run_command('cat',
-   '/sys/devices/system/cpu/present'
-  ).stdout().strip()
-   elif host_machine.system() == 'freebsd'
-   snum_cores = run_command('/sbin/sysctl', '-n',
-'hw.ncpu').stdout().strip()
-   inum_cores = snum_cores.to_int() - 1
-num_cores = '0-@0@'.format(inum_cores)
-   endif
+# some perf tests (eg: memcpy perf autotest)take very long
+# to complete, so timeout to 10 minutes
+timeout_seconds = 600
+timeout_seconds_fast = 10
 
-   num_cores_arg = '-l ' + num_cores
+# Retrieve the number of CPU cores, defaulting to 4.
+num_cores = '0-3'
+if host_machine.system() == 'linux'
+   num_cores = run_command('cat',
+   '/sys/devices/system/cpu/present'
+  ).stdout().strip()
+elif host_machine.system() == 'freebsd'
+   snum_cores = run_command('/sbin/sysctl', '-n',
+'hw.ncpu').stdout().strip()
+   inum_cores = snum_cores.to_int() - 1
+num_cores = '0-@0@'.format(inum_cores)
+endif
 
-   test_args = [num_cores_arg, '-n 4']
-   foreach arg : fast_parallel_test_names
-   if host_machine.system() == 'linux'
-   test(arg, dpdk_test,
- env : ['DPDK_TEST=' + arg],
- args : test_args +
-['--file-prefix=@0@'.format(arg)],
-   timeout : timeout_seconds_fast,
-   suite : 'fast-tests')
-   else
-   test(arg, dpdk_test,
-   env : ['DPDK_TEST=' + arg],
-   args : test_args,
-   timeout : timeout_seconds_fast,
-   suite : 'fast-tests')
-   endif
-   endforeach
+num_cores_arg = '-l ' + num_cores
 
-   foreach arg : fast_non_parallel_test_names
+test_args = [num_cores_arg, '-n 4']
+foreach arg : fast_parallel_test_names
+   if host_machine.system() == 'linux'
+   test(arg, dpdk_test,
+ env : ['DPDK_TEST=' + arg],
+ args : test_args +
+['--file-prefix=@0@'.format(arg)],
+   timeout : timeout_seconds_fast,
+   suite : 'fast-tests')
+   else
test(arg, dpdk_test,
env : ['DPDK_TEST=' + arg],
args : test_args,
-   timeout : timeout_seconds_fast,
-   is_parallel : false,
-   suite : 'fast-tests')
-   endforeach
+   timeout : timeout_seconds_fast,
+   suite : 'fast-tests')
+   endif
+endforeach
 
-   foreach arg : perf_test_names
-   test(arg, dpdk_test,
+foreach arg : fast_non_parallel_test_names
+   test(arg, dpdk_test,
+   env : ['DPDK_TEST=' + arg],
+   args : test_args,
+   timeout : timeout_seconds_fast,
+   is_parallel : false,
+   suite : 'fast-tests')
+endforeach
+
+foreach

Re: [dpdk-dev] [PATCH v2] meson: don't check dependencies for tests if not required

2019-05-30 Thread Bruce Richardson
On Thu, May 30, 2019 at 03:38:36PM +0300, Ilya Maximets wrote:
> Don't need to check dependencies if test apps will not be built anyway.
> 
> Signed-off-by: Ilya Maximets 
> ---
> 
> Version 2:
>   - 'get_option('tests')' check moved to the top.
> 
>  app/test/meson.build | 141 ++-
>  1 file changed, 72 insertions(+), 69 deletions(-)
> 
Acked-by: Bruce Richardson 


[dpdk-dev] [PATCH 1/2] eventdev: replace mbufs with events in Rx callback

2019-05-30 Thread Nikhil Rao
Replace the mbuf pointer array in the event eth Rx adapter
callback with an event array instead of an mbuf array. Using
an event array allows the application to change attributes
of the events enqueued by the SW adapter.

Signed-off-by: Nikhil Rao 
---
 lib/librte_eventdev/rte_event_eth_rx_adapter.h | 57 +++---
 lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 ---
 2 files changed, 52 insertions(+), 37 deletions(-)

This patch depends on
http://patchwork.dpdk.org/patch/53614/

v1:
* add implementation to RFC

diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.h 
b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
index 2314b93..a64eed0 100644
--- a/lib/librte_eventdev/rte_event_eth_rx_adapter.h
+++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
@@ -66,16 +66,17 @@
  * For SW based packet transfers, i.e., when the
  * RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT is not set in the adapter's
  * capabilities flags for a particular ethernet device, the service function
- * temporarily enqueues mbufs to an event buffer before batch enqueueing these
+ * temporarily enqueues events to an event buffer before batch enqueueing these
  * to the event device. If the buffer fills up, the service function stops
  * dequeueing packets from the ethernet device. The application may want to
  * monitor the buffer fill level and instruct the service function to
- * selectively buffer packets. The application may also use some other
+ * selectively buffer events. The application may also use some other
  * criteria to decide which packets should enter the event device even when
- * the event buffer fill level is low. The
- * rte_event_eth_rx_adapter_cb_register() function allows the
- * application to register a callback that selects which packets to enqueue
- * to the event device.
+ * the event buffer fill level is low or may want to enqueue packets to an
+ * internal event port. The rte_event_eth_rx_adapter_cb_register() function
+ * allows the application to register a callback that selects which packets are
+ * enqueued to the event device by the SW adapter. The callback interface is
+ * event based so the callback can also modify the event data if it needs to.
  */
 
 #ifdef __cplusplus
@@ -217,12 +218,23 @@ struct rte_event_eth_rx_adapter_stats {
  * @b EXPERIMENTAL: this API may change without prior notice
  *
  * Callback function invoked by the SW adapter before it continues
- * to process packets. The callback is passed the size of the enqueue
+ * to process events. The callback is passed the size of the enqueue
  * buffer in the SW adapter and the occupancy of the buffer. The
- * callback can use these values to decide which mbufs should be
- * enqueued to the event device. If the return value of the callback
- * is less than nb_mbuf then the SW adapter uses the return value to
- * enqueue enq_mbuf[] to the event device.
+ * callback can use these values to decide which events are
+ * enqueued to the event device by the SW adapter. The callback may
+ * also enqueue events internally using its own event port. The SW
+ * adapter populates the event information based on the Rx queue
+ * configuration in the adapter. The callback can modify the this event
+ * information for the events to be enqueued by the SW adapter.
+ *
+ * The callback return value is the number of events from the
+ * beginning of the event array that are to be enqueued by
+ * the SW adapter. It is the callback's responsibility to arrange
+ * these events at the beginning of the array, if these events are
+ * not contiguous in the original array. The *nb_dropped* parameter is
+ * a pointer to the number of events dropped by the callback, this
+ * number is used by the adapter to indicate the number of dropped packets
+ * as part of its statistics.
  *
  * @param eth_dev_id
  *  Port identifier of the Ethernet device.
@@ -231,27 +243,26 @@ struct rte_event_eth_rx_adapter_stats {
  * @param enqueue_buf_size
  *  Total enqueue buffer size.
  * @param enqueue_buf_count
- *  mbuf count in enqueue buffer.
- * @param mbuf
- *  mbuf array.
- * @param nb_mbuf
- *  mbuf count.
+ *  Event count in enqueue buffer.
+ * @param[in, out] ev
+ *  Event array.
+ * @param nb_event
+ *  Event array length.
  * @param cb_arg
  *  Callback argument.
- * @param[out] enq_mbuf
- *  The adapter enqueues enq_mbuf[] if the return value of the
- *  callback is less than nb_mbuf
+ * @param[out] nb_dropped
+ *  Packets dropped by callback.
  * @return
- *  Returns the number of mbufs should be enqueued to eventdev
+ *  - The number of events to be enqueued by the SW adapter.
  */
 typedef uint16_t (*rte_event_eth_rx_adapter_cb_fn)(uint16_t eth_dev_id,
uint16_t queue_id,
uint32_t enqueue_buf_size,
uint32_t enqueue_buf_count,
-   struct rte_mbuf **mbuf,
-  

Re: [dpdk-dev] [PATCH v2] meson: don't check dependencies for tests if not required

2019-05-30 Thread Aaron Conole
Ilya Maximets  writes:

> Don't need to check dependencies if test apps will not be built anyway.
>
> Signed-off-by: Ilya Maximets 
> ---
>
> Version 2:
>   - 'get_option('tests')' check moved to the top.
>
>  app/test/meson.build | 141 ++-
>  1 file changed, 72 insertions(+), 69 deletions(-)
>

Acked-by: Aaron Conole 

Thanks for this, Ilya!


[dpdk-dev] [PATCH 2/2] eventdev: add dropped count to Rx adapter stats

2019-05-30 Thread Nikhil Rao
The application can install a callback invoked by
the Rx adapter. The callback can drop packets and populate
a callback argument with the number of dropped packets.
Add a Rx adapter stats field to keep track of the total
number of dropped packets.

Signed-off-by: Nikhil Rao 
---
 lib/librte_eventdev/rte_event_eth_rx_adapter.h | 2 ++
 lib/librte_eventdev/rte_event_eth_rx_adapter.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.h 
b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
index a64eed0..4ea5a53 100644
--- a/lib/librte_eventdev/rte_event_eth_rx_adapter.h
+++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
@@ -197,6 +197,8 @@ struct rte_event_eth_rx_adapter_stats {
/**< Eventdev enqueue count */
uint64_t rx_enq_retry;
/**< Eventdev enqueue retry count */
+   uint64_t rx_dropped;
+   /**< Received packet dropped count */
uint64_t rx_enq_start_ts;
/**< Rx enqueue start timestamp */
uint64_t rx_enq_block_cycles;
diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.c 
b/lib/librte_eventdev/rte_event_eth_rx_adapter.c
index ab4e3cf..4d41aa7 100644
--- a/lib/librte_eventdev/rte_event_eth_rx_adapter.c
+++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.c
@@ -807,6 +807,7 @@ static uint16_t rxa_gcd_u16(uint16_t a, uint16_t b)
 
if (dev_info->cb_fn) {
 
+   dropped = 0;
nb_cb = dev_info->cb_fn(eth_dev_id,
rx_queue_id,
ETH_EVENT_BUFFER_SIZE,
@@ -820,6 +821,8 @@ static uint16_t rxa_gcd_u16(uint16_t a, uint16_t b)
nb_cb, num);
else
num = nb_cb;
+   if (dropped)
+   rx_adapter->stats.rx_dropped += dropped;
}
 
buf->count += num;
-- 
1.8.3.1



[dpdk-dev] [PATCH 1/2] eventdev: replace mbufs with events in Rx callback

2019-05-30 Thread Nikhil Rao
Replace the mbuf pointer array in the event eth Rx adapter
callback with an event array instead of an mbuf array. Using
an event array allows the application to change attributes
of the events enqueued by the SW adapter.

Signed-off-by: Nikhil Rao 
---
 lib/librte_eventdev/rte_event_eth_rx_adapter.h | 57 +++---
 lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 ---
 2 files changed, 52 insertions(+), 37 deletions(-)

This patch depends on
http://patchwork.dpdk.org/patch/53614/

Resending - the previous attempt only sent the first patch.

v1:
* add implementation to RFC

diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.h 
b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
index 2314b93..a64eed0 100644
--- a/lib/librte_eventdev/rte_event_eth_rx_adapter.h
+++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
@@ -66,16 +66,17 @@
  * For SW based packet transfers, i.e., when the
  * RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT is not set in the adapter's
  * capabilities flags for a particular ethernet device, the service function
- * temporarily enqueues mbufs to an event buffer before batch enqueueing these
+ * temporarily enqueues events to an event buffer before batch enqueueing these
  * to the event device. If the buffer fills up, the service function stops
  * dequeueing packets from the ethernet device. The application may want to
  * monitor the buffer fill level and instruct the service function to
- * selectively buffer packets. The application may also use some other
+ * selectively buffer events. The application may also use some other
  * criteria to decide which packets should enter the event device even when
- * the event buffer fill level is low. The
- * rte_event_eth_rx_adapter_cb_register() function allows the
- * application to register a callback that selects which packets to enqueue
- * to the event device.
+ * the event buffer fill level is low or may want to enqueue packets to an
+ * internal event port. The rte_event_eth_rx_adapter_cb_register() function
+ * allows the application to register a callback that selects which packets are
+ * enqueued to the event device by the SW adapter. The callback interface is
+ * event based so the callback can also modify the event data if it needs to.
  */
 
 #ifdef __cplusplus
@@ -217,12 +218,23 @@ struct rte_event_eth_rx_adapter_stats {
  * @b EXPERIMENTAL: this API may change without prior notice
  *
  * Callback function invoked by the SW adapter before it continues
- * to process packets. The callback is passed the size of the enqueue
+ * to process events. The callback is passed the size of the enqueue
  * buffer in the SW adapter and the occupancy of the buffer. The
- * callback can use these values to decide which mbufs should be
- * enqueued to the event device. If the return value of the callback
- * is less than nb_mbuf then the SW adapter uses the return value to
- * enqueue enq_mbuf[] to the event device.
+ * callback can use these values to decide which events are
+ * enqueued to the event device by the SW adapter. The callback may
+ * also enqueue events internally using its own event port. The SW
+ * adapter populates the event information based on the Rx queue
+ * configuration in the adapter. The callback can modify the this event
+ * information for the events to be enqueued by the SW adapter.
+ *
+ * The callback return value is the number of events from the
+ * beginning of the event array that are to be enqueued by
+ * the SW adapter. It is the callback's responsibility to arrange
+ * these events at the beginning of the array, if these events are
+ * not contiguous in the original array. The *nb_dropped* parameter is
+ * a pointer to the number of events dropped by the callback, this
+ * number is used by the adapter to indicate the number of dropped packets
+ * as part of its statistics.
  *
  * @param eth_dev_id
  *  Port identifier of the Ethernet device.
@@ -231,27 +243,26 @@ struct rte_event_eth_rx_adapter_stats {
  * @param enqueue_buf_size
  *  Total enqueue buffer size.
  * @param enqueue_buf_count
- *  mbuf count in enqueue buffer.
- * @param mbuf
- *  mbuf array.
- * @param nb_mbuf
- *  mbuf count.
+ *  Event count in enqueue buffer.
+ * @param[in, out] ev
+ *  Event array.
+ * @param nb_event
+ *  Event array length.
  * @param cb_arg
  *  Callback argument.
- * @param[out] enq_mbuf
- *  The adapter enqueues enq_mbuf[] if the return value of the
- *  callback is less than nb_mbuf
+ * @param[out] nb_dropped
+ *  Packets dropped by callback.
  * @return
- *  Returns the number of mbufs should be enqueued to eventdev
+ *  - The number of events to be enqueued by the SW adapter.
  */
 typedef uint16_t (*rte_event_eth_rx_adapter_cb_fn)(uint16_t eth_dev_id,
uint16_t queue_id,
uint32_t enqueue_buf_size,
uint32_t enqueue_buf_count,
- 

[dpdk-dev] [PATCH] eal: fix positive error codes from probe/remove

2019-05-30 Thread Ilya Maximets
According to API, 'rte_dev_probe()' and 'rte_dev_remove()' and their
'hotplug' equivalents must return 0 or negative error code. Bus code
returns positive values if device wasn't recognized by any driver, so
the result of 'bus->plug/unplug()' must be converted.

Positive on remove means that device not found by driver.
Positive on probe means that there are no suitable buses/drivers,
i.e. device is not supported.

CC: sta...@dpdk.org
Fixes: a3ee360f4440 ("eal: add hotplug add/remove device")
Fixes: 244d5130719c ("eal: enable hotplug on multi-process")

Signed-off-by: Ilya Maximets 
---
 lib/librte_eal/common/eal_common_dev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index 824b8f926..f9cae8e26 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -233,7 +233,7 @@ rte_dev_probe(const char *devargs)
 * process.
 */
if (ret != -EEXIST)
-   return ret;
+   return (ret < 0) ? ret : -ENOTSUP;
}
 
/* primary send attach sync request to secondary. */
@@ -319,7 +319,7 @@ local_dev_remove(struct rte_device *dev)
if (ret) {
RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n",
dev->name);
-   return ret;
+   return (ret < 0) ? ret : -ENOENT;
}
 
return 0;
-- 
2.17.1



Re: [dpdk-dev] [PATCH 2/2] meson: make build configurable

2019-05-30 Thread Luca Boccassi
On Thu, 2019-05-30 at 14:59 +0300, Ilya Maximets wrote:
> On 30.05.2019 14:06, Luca Boccassi wrote:
> > On Thu, 2019-05-30 at 13:03 +0300, Ilya Maximets wrote:
> > > On 29.05.2019 23:37, Luca Boccassi wrote:
> > > > On Wed, 2019-05-29 at 19:39 +0300, Ilya Maximets wrote:
> > > > > The first thing many developers do before start building DPDK
> > > > > is
> > > > > disabling all the not needed divers and libraries. This
> > > > > happens
> > > > > just because more than a half of DPDK dirvers and libraries
> > > > > are
> > > > > not
> > > > > needed for the particular reason. For example, you don't need
> > > > > dpaa*, octeon*, various croypto devices, eventdev, etc. if
> > > > > you're
> > > > > only want to build OVS for x86_64 with static linking.
> > > > > 
> > > > > By disabling everything you don't need, build speeds up
> > > > > literally
> > > > > 10x
> > > > > times. This is important for CI systems. For example,
> > > > > TravisCI
> > > > > wastes
> > > > > 10 minutes for the default DPDK build just to check linking
> > > > > with
> > > > > OVS.
> > > > > 
> > > > > Another thing is the binary size. Number of DPDK libraries
> > > > > and,
> > > > > as a result, size of resulted statically linked application
> > > > > decreases
> > > > > significantly.
> > > > > 
> > > > > Important thing also that you're able to not install some
> > > > > dependencies
> > > > > if you don't have them on a target platform. Just disable
> > > > > libs/drivers
> > > > > that depends on it. Similar thing for the glibc version
> > > > > mismatch
> > > > > between build and target platforms.
> > > > > 
> > > > > Also, I have to note that less code means less probability of
> > > > > failures and less number of attack vectors.
> > > > > 
> > > > > This patch gives 'meson' the power of configurability that we
> > > > > have with 'make'. Using new options it's possible to enable
> > > > > just
> > > > > what you need and nothing more.
> > > > > 
> > > > > For example, following cmdline could be used to build almost
> > > > > minimal
> > > > > set of DPDK libs and drivers to check OVS build:
> > > > > 
> > > > >   $ meson build -Dexamples='' -Dtests=false
> > > > > -Denable_kmods=false
> > > > > \
> > > > > -Ddrivers_bus=pci,vdev  \
> > > > > -Ddrivers_mempool=ring  \
> > > > > -Ddrivers_net=null,virtio,ring  \
> > > > > -Ddrivers_crypto=virtio \
> > > > > -Ddrivers_compress=none \
> > > > > -Ddrivers_event=none\
> > > > > -Ddrivers_baseband=none \
> > > > > -Ddrivers_raw=none  \
> > > > > -Ddrivers_common=none   \
> > > > >
> > > > > -Dlibs=kvargs,eal,cmdline,ring,mempool,mbuf,net,meter,\
> > > > >ethdev,pci,hash,cryptodev,pdump,vhost
> > > > > \
> > > > > -Dapps=none
> > > > > 
> > > > > Adding a few real net drivers will give configuration that
> > > > > can be
> > > > > used
> > > > > in production environment.
> > > > > 
> > > > > Looks not very pretty, but this could be moved to a script.
> > > > > 
> > > > > Build details:
> > > > > 
> > > > >   Build targets in project: 57
> > > > > 
> > > > >   $ time ninja
> > > > >   real0m11,528s
> > > > >   user1m4,137s
> > > > >   sys 0m4,935s
> > > > > 
> > > > >   $ du -sh ../dpdk_meson_install/
> > > > >   3,5M../dpdk_meson_install/
> > > > > 
> > > > > To compare with what we have without these options:
> > > > > 
> > > > >   $ meson build -Dexamples='' -Dtests=false
> > > > > -Denable_kmods=false
> > > > >   Build targets in project: 434
> > > > > 
> > > > >   $ time ninja
> > > > >   real1m38,963s
> > > > >   user10m18,624s
> > > > >   sys 0m45,478s
> > > > > 
> > > > >   $ du -sh ../dpdk_meson_install/
> > > > >   27M ../dpdk_meson_install/
> > > > > 
> > > > > 10x speed up for the user time.
> > > > > 7.7 times size decrease.
> > > > > 
> > > > > This is probably not much user-friendly because it's not a
> > > > > Kconfig
> > > > > and dependency tracking in meson is really poor, so it
> > > > > requires
> > > > > usually few iterations to pick correct set of libraries to
> > > > > satisfy
> > > > > all dependencies. However, it's not a big deal. Options
> > > > > intended
> > > > > for a proficient users who knows what they need.
> > > > 
> > > > Hi,
> > > > 
> > > > We talked about this a few times in the past, and it was
> > > > actually
> > > > one
> > > > of the design goals to _avoid_ replicating the octopus-like
> > > > config
> > > > system of the makefiles. That's because it makes the test
> > > > matrix
> > > > insanely complicated, not to mention the harm to user
> > > > friendliness,
> > > > among other things.
> > > > 
> > > > If someone doesn't want to use a PMD, they can just avoid
> > > > installing it
> > > > - it's simple enough.
> > > 
> > > So how c

Re: [dpdk-dev] [PATCH v2] meson: don't check dependencies for tests if not required

2019-05-30 Thread Luca Boccassi
On Thu, 2019-05-30 at 15:38 +0300, Ilya Maximets wrote:
> Don't need to check dependencies if test apps will not be built
> anyway.
> 
> Signed-off-by: Ilya Maximets <
> i.maxim...@samsung.com
> >
> ---
> 
> Version 2:
>   - 'get_option('tests')' check moved to the top.
> 
>  app/test/meson.build | 141 ++---
> --
>  1 file changed, 72 insertions(+), 69 deletions(-)

Acked-by: Luca Boccassi 

-- 
Kind regards,
Luca Boccassi


Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors

2019-05-30 Thread Thomas Monjalon
30/05/2019 12:11, Bruce Richardson:
> On Thu, May 30, 2019 at 09:40:08AM +0200, Thomas Monjalon wrote:
> > 30/05/2019 09:31, David Marchand:
> > > On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger <
> > > step...@networkplumber.org> wrote:
> > > 
> > > > On Thu, 30 May 2019 00:46:30 +0200
> > > > Thomas Monjalon  wrote:
> > > >
> > > > > 23/05/2019 15:58, David Marchand:
> > > > > > From: Stephen Hemminger 
> > > > > >
> > > > > > The fields of the internal EAL core configuration are currently
> > > > > > laid bare as part of the API. This is not good practice and limits
> > > > > > fixing issues with layout and sizes.
> > > > > >
> > > > > > Make new accessor functions for the fields used by current drivers
> > > > > > and examples.
> > > > > [...]
> > > > > > +DPDK_19.08 {
> > > > > > +   global:
> > > > > > +
> > > > > > +   rte_lcore_cpuset;
> > > > > > +   rte_lcore_index;
> > > > > > +   rte_lcore_to_cpu_id;
> > > > > > +   rte_lcore_to_socket_id;
> > > > > > +
> > > > > > +} DPDK_19.05;
> > > > > > +
> > > > > >  EXPERIMENTAL {
> > > > > > global:
> > > > >
> > > > > Just to make sure, are we OK to introduce these functions
> > > > > as non-experimental?
> > > >
> > > > They were in previous releases as inlines this patch converts them
> > > > to real functions.
> > > >
> > > >
> > > Well, yes and no.
> > > 
> > > rte_lcore_index and rte_lcore_to_socket_id already existed, so making them
> > > part of the ABI is fine for me.
> > > 
> > > rte_lcore_to_cpu_id is new but seems quite safe in how it can be used,
> > > adding it to the ABI is ok for me.
> > 
> > It is used by DPAA and some test.
> > I guess adding as experimental is fine too?
> > I'm fine with both options, I'm just trying to apply the policy
> > we agreed on. Does this case deserve an exception?
> > 
> 
> While it may be a good candidate, I'm not sure how much making an exception
> for it really matters. I'd be tempted to just mark it experimental and then
> have it stable for the 19.11 release. What do we really lose by waiting a
> release to stabilize it?

I would agree Bruce.
If no more comment, I will wait for a v5 of this series.





Re: [dpdk-dev] [PATCH v2 1/3] net/af_xdp: enable zero copy by extbuf

2019-05-30 Thread Stephen Hemminger
On Thu, 30 May 2019 17:07:05 +0800
Xiaolong Ye  wrote:

> Implement zero copy of af_xdp pmd through mbuf's external memory
> mechanism to achieve high performance.
> 
> This patch also provides a new parameter "pmd_zero_copy" for user, so they
> can choose to enable zero copy of af_xdp pmd or not.
> 
> To be clear, "zero copy" here is different from the "zero copy mode" of
> AF_XDP, it is about zero copy between af_xdp umem and mbuf used in dpdk
> application.
> 
> Suggested-by: Varghese Vipin 
> Suggested-by: Tummala Sivaprasad 
> Suggested-by: Olivier Matz 
> Signed-off-by: Xiaolong Ye 

Why is this a parameter? Can it just be auto detected.
Remember configuration is evil, it hurts usability, code coverage
and increases complexity.


Re: [dpdk-dev] [PATCH v2 2/3] net/af_xdp: add multi-queue support

2019-05-30 Thread Stephen Hemminger
On Thu, 30 May 2019 17:07:06 +0800
Xiaolong Ye  wrote:

> This patch adds two parameters `start_queue` and `queue_count` to
> specify the range of netdev queues used by AF_XDP pmd.
> 
> Signed-off-by: Xiaolong Ye 

Why does this have to be a config option, we already have max queues
and number of queues in DPDK configuration.


Re: [dpdk-dev] [PATCH v2] doc/testpmd: update compile steps for bpf examples

2019-05-30 Thread Varghese, Vipin
HI Thomas,

Snipped

> > +
> > +   To built other BPF examples, the compiler requires additional command-
> line options.
> 
> "To built" -> "To build"
ok

> 
> I think this note is vague. Don't you think it may confuse user if we don't
> explicit which kind of options are required?

Ok, the `v1` content was `In order to build t2.c and t3.c; pass DPDK targets 
include and library path as compiler options.`. But as user adds other library 
functions are added to `examples/bpf' these may vary too.

Hence should we state as, `To build other BPF examples, appropriate libraries 
and dependencies is to be passed as command line options.`





[dpdk-dev] [Bug 288] Target name recorded wrong when try to build dpdk with x86_64-native-linux-gcc

2019-05-30 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=288

Bug ID: 288
   Summary: Target name recorded wrong when try to build dpdk with
x86_64-native-linux-gcc
   Product: DPDK
   Version: unspecified
  Hardware: All
OS: All
Status: CONFIRMED
  Severity: minor
  Priority: Normal
 Component: mk
  Assignee: dev@dpdk.org
  Reporter: jasvinder.si...@intel.com
  Target Milestone: ---

When build dpdk using x86_64-native-linux-gcc target. At the end, it publish
wrong target name x86_64-native-linuxapp-gcc instead of
x86_64-native-linux-gcc.

Log info: 
$make install T=x86_64-native-linux-gcc -j
   Configuration done using x86_64-native-linux-gcc
== Build lib
== Build lib/librte_kvargs
== Build lib/librte_cfgfile
   SYMLINK-FILE include/rte_cfgfile.h
   CC rte_cfgfile.o
   SYMLINK-FILE include/rte_kvargs.h
   CC rte_kvargs.o




  INSTALL-APP testpipeline
  INSTALL-MAP testpipeline.map
  INSTALL-APP testbbdev
  INSTALL-MAP testbbdev.map
  INSTALL-APP testpmd
  INSTALL-MAP testpmd.map
  LD test
  INSTALL-APP test
  INSTALL-MAP test.map
Build complete [x86_64-native-linuxapp-gcc]  <--

-- 
You are receiving this mail because:
You are the assignee for the bug.

[dpdk-dev] DPDK Release Status Meeting 30/5/2019

2019-05-30 Thread Ferruh Yigit
Minutes 30 May 2019
---

Agenda:
* Release Dates
* Subtrees
* OvS
* Conferences
* Opens


Participants:
* Debian/Microsoft
* Intel
* Marvell
* Mellanox
* Red Hat


Release Dates
-

* v19.08 dates:
  * Proposal/V1   Monday 03 June   2019  
  * Integration/Merge/RC1 Monday 01 July   2019  
  * Release   Thurs  01 August 2019

  * Reminder to send roadmaps for the release, it helps planning
* Intel and Arm already shared the roadmap
* Marvell will have new PMDs and will provide a roadmap

* v19.11 proposed dates, *please comment*,
  * Proposal/V1   Friday 06 September 2019
  * Integration/Merge/RC1 Friday 11 October   2019
  * Release   Friday 08 November  2019

  * Constrains:
* PRC holidays on October 1-7 inclusive, rc1 shouldn't overlap with it
* US DPDK Summit on mid November, better to have release before summit


Subtrees


* main
  * Nothing critical, weekly merge done, pulled from sub-trees
  * rte_ prefix patchset in master, may affect existing patches
  * KNI ethtool removal merged
  * meson fix by Bruce to fix daily Intel builds is waiting

* next-net
  * Merging patches, nothing critical
  * Two new PMDs submitted for this release, memif & hinic

* next-eventdev
  * Nothing merged yet for 19.08

* next-virtio
* next-crypto
* next-pipeline
* next-qos
  * no update received

* Stable trees
  * v18.11.2-rc1 is waiting for test
* 11 June is the target release day
* Red Hat and OvS (Ian) tested it
* Waiting test from others like Intel & Mellanox etc..
* Next week is only full week before target release date
* Microsoft will test when possible


OvS
---

* 18.11.2-rc1 validation has been completed
* There is an OvS patch available to use af_xdp PMD, via 19.08.0-rc0


Conferences
---

* DPDK Userspace summit: DPDK Userspace · Sept. 19-20, 2019
  https://www.dpdk.org/event/dpdk-userspace-bordeaux/
  * CFP Opens: Monday, April 29
  * CFP Closes: Friday, May 31

  * Reminder that CFP closes tomorrow

* US summit dates are not fixed yet


Opens
-

* There is a potential that 19.11 will be big, need to think about ways to
  reduce the risk of delay for the release

* New tool, public-inbox mail archive is enabled:
  *
http://inbox.dpdk.org/announce/db6pr0501mb2167b67f9f92f45a8823c1a5d7...@db6pr0501mb2167.eurprd05.prod.outlook.com/


DPDK Release Status Meetings


The DPDK Release Status Meeting is intended for DPDK Committers to discuss
the status of the master tree and sub-trees, and for project managers to
track progress or milestone dates.

The meeting occurs on Thursdays at 8:30 UTC. If you wish to attend just
send an email to "John McNamara " for the invite.


Re: [dpdk-dev] [Bug 287] netvsc PMD/dpdk/azure: Driver lockup with multi-queue configuration

2019-05-30 Thread Stephen Hemminger
On Thu, 30 May 2019 00:10:21 +
bugzi...@dpdk.org wrote:

> https://bugs.dpdk.org/show_bug.cgi?id=287
> 
> Bug ID: 287
>Summary: netvsc PMD/dpdk/azure: Driver lockup with multi-queue
> configuration
>Product: DPDK
>Version: 18.11
>   Hardware: x86
> OS: Linux
> Status: CONFIRMED
>   Severity: normal
>   Priority: Normal
>  Component: ethdev
>   Assignee: dev@dpdk.org
>   Reporter: mohsinmazhar_sha...@trendmicro.com
>   Target Milestone: ---
> 
> I am running an app using dpdk 18.11 netvsc PMD
> (https://doc.dpdk.org/guides/nics/netvsc.html) on "Ubuntu 18.04 LTS" VM on
> azure running kernel 4.18.0-1018
> (https://launchpad.net/ubuntu/+source/linux-azure/4.18.0-1018.18). The app 
> uses
> multi-queue with 2 cores doing RX/TX. The lockup only occurs when doing a
> connections/second test i.e. exercising the netvsc interface. When the lockup
> occurs the netvsc interface and it's corresponding mellanox slave both can't
> rx/tx packets.
> 

Thanks for the report, busy this week, may have time to address it next week.



[dpdk-dev] [PATCH 1/3] power: add new packet type for capabilities

2019-05-30 Thread Hajkowski
From: Marcin Hajkowski 

Add new packet type and commands for capabilities query.

Signed-off-by: Marcin Hajkowski 
---
 lib/librte_power/channel_commands.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/lib/librte_power/channel_commands.h 
b/lib/librte_power/channel_commands.h
index ce587283c..b1f5584a8 100644
--- a/lib/librte_power/channel_commands.h
+++ b/lib/librte_power/channel_commands.h
@@ -34,6 +34,8 @@ extern "C" {
 /* CPU Power Queries */
 #define CPU_POWER_QUERY_FREQ_LIST  7
 #define CPU_POWER_QUERY_FREQ   8
+#define CPU_POWER_QUERY_CAPS_LIST  9
+#define CPU_POWER_QUERY_CAPS   10
 
 /* --- Outgoing messages --- */
 
@@ -43,6 +45,7 @@ extern "C" {
 
 /* CPU Power Query Responses */
 #define CPU_POWER_FREQ_LIST 3
+#define CPU_POWER_CAPS_LIST 4
 
 #define HOURS 24
 
@@ -106,6 +109,17 @@ struct channel_packet_freq_list {
uint8_t num_vcpu;
 };
 
+struct channel_packet_caps_list {
+   uint64_t resource_id; /**< core_num, device */
+   uint32_t unit;/**< scale down/up/min/max */
+   uint32_t command; /**< Power, IO, etc */
+   char vm_name[VM_MAX_NAME_SZ];
+
+   uint64_t turbo[MAX_VCPU_PER_VM];
+   uint64_t priority[MAX_VCPU_PER_VM];
+   uint8_t num_vcpu;
+};
+
 
 #ifdef __cplusplus
 }
-- 
2.17.2



[dpdk-dev] [PATCH 2/3] examples/power_manager: send cpu capabilities on vm request

2019-05-30 Thread Hajkowski
From: Marcin Hajkowski 

Send capabilities for requested cores.

Signed-off-by: Marcin Hajkowski 
---
 examples/vm_power_manager/channel_monitor.c | 67 +
 1 file changed, 67 insertions(+)

diff --git a/examples/vm_power_manager/channel_monitor.c 
b/examples/vm_power_manager/channel_monitor.c
index bfd9cc38d..731b3b480 100644
--- a/examples/vm_power_manager/channel_monitor.c
+++ b/examples/vm_power_manager/channel_monitor.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "channel_monitor.h"
@@ -704,6 +705,60 @@ send_freq(struct channel_packet *pkt,
chan_info);
 }
 
+static int
+send_capabilities(struct channel_packet *pkt,
+   struct channel_info *chan_info,
+   bool list_requested)
+{
+   unsigned int vcore_id = pkt->resource_id;
+   struct channel_packet_caps_list channel_pkt_caps_list;
+   struct vm_info info;
+   struct rte_power_core_capabilities caps;
+   int ret;
+
+   if (get_info_vm(pkt->vm_name, &info) != 0)
+   return -1;
+
+   if (!list_requested && vcore_id >= MAX_VCPU_PER_VM)
+   return -1;
+
+   if (!info.allow_query)
+   return -1;
+
+   channel_pkt_caps_list.command = CPU_POWER_CAPS_LIST;
+   channel_pkt_caps_list.num_vcpu = info.num_vcpus;
+
+   if (list_requested) {
+   unsigned int i;
+   for (i = 0; i < info.num_vcpus; i++) {
+   ret = rte_power_get_capabilities(info.pcpu_map[i],
+   &caps);
+   if (ret == 0) {
+   channel_pkt_caps_list.turbo[i] =
+   caps.turbo;
+   channel_pkt_caps_list.priority[i] =
+   caps.priority;
+   } else
+   return -1;
+
+   }
+   } else {
+   ret = rte_power_get_capabilities(info.pcpu_map[vcore_id],
+   &caps);
+   if (ret == 0) {
+   channel_pkt_caps_list.turbo[vcore_id] =
+   caps.turbo;
+   channel_pkt_caps_list.priority[vcore_id] =
+   caps.priority;
+   } else
+   return -1;
+   }
+
+   return write_binary_packet(&channel_pkt_caps_list,
+   sizeof(channel_pkt_caps_list),
+   chan_info);
+}
+
 static int
 send_ack_for_received_cmd(struct channel_packet *pkt,
struct channel_info *chan_info,
@@ -812,6 +867,18 @@ process_request(struct channel_packet *pkt, struct 
channel_info *chan_info)
RTE_LOG(ERR, CHANNEL_MONITOR, "Error during frequency 
sending.\n");
}
 
+   if (pkt->command == CPU_POWER_QUERY_CAPS_LIST ||
+   pkt->command == CPU_POWER_QUERY_CAPS) {
+
+   RTE_LOG(INFO, CHANNEL_MONITOR,
+   "Capabilities for %s requested.\n", pkt->vm_name);
+   int ret = send_capabilities(pkt,
+   chan_info,
+   pkt->command == CPU_POWER_QUERY_CAPS_LIST);
+   if (ret < 0)
+   RTE_LOG(ERR, CHANNEL_MONITOR, "Error during sending 
capabilities.\n");
+   }
+
/*
 * Return is not checked as channel status may have been set to DISABLED
 * from management thread
-- 
2.17.2



[dpdk-dev] [PATCH 0/3] Core capabilities query

2019-05-30 Thread Hajkowski
From: Marcin Hajkowski 

Extend guest channel and sample apps to query CPU capabilities.

Please note that these changes depends on
(http://patchwork.dpdk.org/cover/52335/) and
(http://patchwork.dpdk.org/cover/52213/) which should be applied first.


Marcin Hajkowski (3):
  power: add new packet type for capabilities
  examples/power_manager: send cpu capabilities on vm request
  examples/power_guest: send request for specified core capabilities

 examples/vm_power_manager/channel_monitor.c   |  67 ++
 .../guest_cli/vm_power_cli_guest.c| 119 +-
 lib/librte_power/channel_commands.h   |  14 +++
 3 files changed, 198 insertions(+), 2 deletions(-)

-- 
2.17.2



[dpdk-dev] [PATCH 3/3] examples/power_guest: send request for specified core capabilities

2019-05-30 Thread Hajkowski
From: Marcin Hajkowski 

Send request to power manager for core id provided
by user to get related capabilities.

Signed-off-by: Marcin Hajkowski 
---
 .../guest_cli/vm_power_cli_guest.c| 119 +-
 1 file changed, 117 insertions(+), 2 deletions(-)

diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c 
b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c
index 848230248..de85c1406 100644
--- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c
+++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c
@@ -132,7 +132,7 @@ struct cmd_freq_list_result {
 };
 
 static int
-query_freq_list(struct channel_packet *pkt, unsigned int lcore_id)
+query_data(struct channel_packet *pkt, unsigned int lcore_id)
 {
int ret;
ret = rte_power_guest_channel_send_msg(pkt, lcore_id);
@@ -206,7 +206,7 @@ cmd_query_freq_list_parsed(void *parsed_result,
pkt.resource_id = lcore_id;
}
 
-   ret = query_freq_list(&pkt, lcore_id);
+   ret = query_data(&pkt, lcore_id);
if (ret < 0) {
cmdline_printf(cl, "Error during sending frequency list 
query.\n");
return;
@@ -248,6 +248,120 @@ cmdline_parse_inst_t cmd_query_freq_list = {
},
 };
 
+struct cmd_query_caps_result {
+   cmdline_fixed_string_t query_caps;
+   cmdline_fixed_string_t cpu_num;
+};
+
+static int
+receive_capabilities(struct channel_packet_caps_list *pkt_caps_list,
+   unsigned int lcore_id)
+{
+   int ret;
+
+   ret = rte_power_guest_channel_receive_msg(pkt_caps_list,
+   sizeof(struct channel_packet_caps_list),
+   lcore_id);
+   if (ret < 0) {
+   RTE_LOG(ERR, GUEST_CLI, "Error receiving message.\n");
+   return -1;
+   }
+   if (pkt_caps_list->command != CPU_POWER_CAPS_LIST) {
+   RTE_LOG(ERR, GUEST_CLI, "Unexpected message received.\n");
+   return -1;
+   }
+   return 0;
+}
+
+static void
+cmd_query_caps_list_parsed(void *parsed_result,
+   __rte_unused struct cmdline *cl,
+   __rte_unused void *data)
+{
+   struct cmd_query_caps_result *res = parsed_result;
+   unsigned int lcore_id;
+   struct channel_packet_caps_list pkt_caps_list;
+   struct channel_packet pkt;
+   bool query_list = false;
+   int ret;
+   char *ep;
+
+   memset(&pkt, 0, sizeof(struct channel_packet));
+   memset(&pkt_caps_list, 0, sizeof(struct channel_packet_caps_list));
+
+   if (!strcmp(res->cpu_num, "all")) {
+
+   /* Get first enabled lcore. */
+   lcore_id = rte_get_next_lcore(-1,
+   0,
+   0);
+   if (lcore_id == RTE_MAX_LCORE) {
+   cmdline_printf(cl, "Enabled core not found.\n");
+   return;
+   }
+
+   pkt.command = CPU_POWER_QUERY_CAPS_LIST;
+   strcpy(pkt.vm_name, policy.vm_name);
+   query_list = true;
+   } else {
+   errno = 0;
+   lcore_id = (unsigned int)strtol(res->cpu_num, &ep, 10);
+   if (errno != 0 || lcore_id >= MAX_VCPU_PER_VM ||
+   ep == res->cpu_num) {
+   cmdline_printf(cl, "Invalid parameter provided.\n");
+   return;
+   }
+   pkt.command = CPU_POWER_QUERY_CAPS;
+   strcpy(pkt.vm_name, policy.vm_name);
+   pkt.resource_id = lcore_id;
+   }
+
+   ret = query_data(&pkt, lcore_id);
+   if (ret < 0) {
+   cmdline_printf(cl, "Error during sending capabilities 
query.\n");
+   return;
+   }
+
+   ret = receive_capabilities(&pkt_caps_list, lcore_id);
+   if (ret < 0) {
+   cmdline_printf(cl, "Error during capabilities reception.\n");
+   return;
+   }
+   if (query_list) {
+   unsigned int i;
+   for (i = 0; i < pkt_caps_list.num_vcpu; ++i)
+   cmdline_printf(cl, "Capabilities of [%d] vcore are:"
+   " turbo possibility: %ld, is priority 
core: %ld.\n",
+   i,
+   pkt_caps_list.turbo[i],
+   pkt_caps_list.priority[i]);
+   } else {
+   cmdline_printf(cl, "Capabilities of [%d] vcore are:"
+   " turbo possibility: %ld, is priority core: 
%ld.\n",
+   lcore_id,
+   pkt_caps_list.turbo[lcore_id],
+   pkt_caps_list.priority[lcore_id]);
+   }
+}
+
+cmdline_parse_token_string_t cmd_query_caps_token =
+   TOKEN_STRING_INITIALIZER(struct cmd_query_caps_result, query_caps, 
"query_cpu_caps");
+cmdline_parse_token_string_t cmd_query_cap

Re: [dpdk-dev] [PATCH v2] ipsec: include high order bytes of esn in pkt len

2019-05-30 Thread Ananyev, Konstantin
Hi Lukasz,

> diff --git a/lib/librte_ipsec/esp_outb.c b/lib/librte_ipsec/esp_outb.c
> index c798bc4..ed5974b 100644
> --- a/lib/librte_ipsec/esp_outb.c
> +++ b/lib/librte_ipsec/esp_outb.c
> @@ -126,11 +126,11 @@ outb_tun_pkt_prepare(struct rte_ipsec_sa *sa, 
> rte_be64_t sqc,
> 
>   /* pad length + esp tail */
>   pdlen = clen - plen;
> - tlen = pdlen + sa->icv_len;
> + tlen = pdlen + sa->icv_len + sa->sqh_len;

We probably don't want to increase pkt_len by  sa->sqh_len for inline case.
That's why I suggested to pass sqh_len as parameter to that function.
Then for inline we can just pass 0.
Do you see any obstacles with that approach?
Same thought for transport mode.
Konstantin

> 
>   /* do append and prepend */
>   ml = rte_pktmbuf_lastseg(mb);
> - if (tlen + sa->sqh_len + sa->aad_len > rte_pktmbuf_tailroom(ml))
> + if (tlen + sa->aad_len > rte_pktmbuf_tailroom(ml))
>   return -ENOSPC;
> 
>   /* prepend header */
> @@ -152,8 +152,8 @@ outb_tun_pkt_prepare(struct rte_ipsec_sa *sa, rte_be64_t 
> sqc,
>   rte_memcpy(ph, sa->hdr, sa->hdr_len);
> 
>   /* update original and new ip header fields */
> - update_tun_l3hdr(sa, ph + sa->hdr_l3_off, mb->pkt_len, sa->hdr_l3_off,
> - sqn_low16(sqc));
> + update_tun_l3hdr(sa, ph + sa->hdr_l3_off, mb->pkt_len - sa->sqh_len,
> + sa->hdr_l3_off, sqn_low16(sqc));
> 
>   /* update spi, seqn and iv */
>   esph = (struct esp_hdr *)(ph + sa->hdr_len);
> @@ -292,11 +292,11 @@ outb_trs_pkt_prepare(struct rte_ipsec_sa *sa, 
> rte_be64_t sqc,
> 
>   /* pad length + esp tail */
>   pdlen = clen - plen;
> - tlen = pdlen + sa->icv_len;
> + tlen = pdlen + sa->icv_len + sa->sqh_len;
> 
>   /* do append and insert */
>   ml = rte_pktmbuf_lastseg(mb);
> - if (tlen + sa->sqh_len + sa->aad_len > rte_pktmbuf_tailroom(ml))
> + if (tlen + sa->aad_len > rte_pktmbuf_tailroom(ml))
>   return -ENOSPC;
> 
>   /* prepend space for ESP header */
> @@ -314,8 +314,8 @@ outb_trs_pkt_prepare(struct rte_ipsec_sa *sa, rte_be64_t 
> sqc,
>   insert_esph(ph, ph + hlen, uhlen);
> 
>   /* update ip  header fields */
> - np = update_trs_l3hdr(sa, ph + l2len, mb->pkt_len, l2len, l3len,
> - IPPROTO_ESP);
> + np = update_trs_l3hdr(sa, ph + l2len, mb->pkt_len - sa->sqh_len, l2len,
> + l3len, IPPROTO_ESP);
> 
>   /* update spi, seqn and iv */
>   esph = (struct esp_hdr *)(ph + uhlen);
> @@ -425,6 +425,9 @@ esp_outb_sqh_process(const struct rte_ipsec_session *ss, 
> struct rte_mbuf *mb[],
>   for (i = 0; i != num; i++) {
>   if ((mb[i]->ol_flags & PKT_RX_SEC_OFFLOAD_FAILED) == 0) {
>   ml = rte_pktmbuf_lastseg(mb[i]);
> + /* remove high-order 32 bits of esn from packet len */
> + mb[i]->pkt_len -= sa->sqh_len;
> + ml->data_len -= sa->sqh_len;
>   icv = rte_pktmbuf_mtod_offset(ml, void *,
>   ml->data_len - icv_len);
>   remove_sqh(icv, icv_len);


Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors

2019-05-30 Thread David Marchand
On Thu, May 30, 2019 at 3:39 PM Thomas Monjalon  wrote:

> 30/05/2019 12:11, Bruce Richardson:
> > On Thu, May 30, 2019 at 09:40:08AM +0200, Thomas Monjalon wrote:
> > > 30/05/2019 09:31, David Marchand:
> > > > On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger <
> > > > step...@networkplumber.org> wrote:
> > > >
> > > > > On Thu, 30 May 2019 00:46:30 +0200
> > > > > Thomas Monjalon  wrote:
> > > > >
> > > > > > 23/05/2019 15:58, David Marchand:
> > > > > > > From: Stephen Hemminger 
> > > > > > >
> > > > > > > The fields of the internal EAL core configuration are currently
> > > > > > > laid bare as part of the API. This is not good practice and
> limits
> > > > > > > fixing issues with layout and sizes.
> > > > > > >
> > > > > > > Make new accessor functions for the fields used by current
> drivers
> > > > > > > and examples.
> > > > > > [...]
> > > > > > > +DPDK_19.08 {
> > > > > > > +   global:
> > > > > > > +
> > > > > > > +   rte_lcore_cpuset;
> > > > > > > +   rte_lcore_index;
> > > > > > > +   rte_lcore_to_cpu_id;
> > > > > > > +   rte_lcore_to_socket_id;
> > > > > > > +
> > > > > > > +} DPDK_19.05;
> > > > > > > +
> > > > > > >  EXPERIMENTAL {
> > > > > > > global:
> > > > > >
> > > > > > Just to make sure, are we OK to introduce these functions
> > > > > > as non-experimental?
> > > > >
> > > > > They were in previous releases as inlines this patch converts them
> > > > > to real functions.
> > > > >
> > > > >
> > > > Well, yes and no.
> > > >
> > > > rte_lcore_index and rte_lcore_to_socket_id already existed, so
> making them
> > > > part of the ABI is fine for me.
> > > >
> > > > rte_lcore_to_cpu_id is new but seems quite safe in how it can be
> used,
> > > > adding it to the ABI is ok for me.
> > >
> > > It is used by DPAA and some test.
> > > I guess adding as experimental is fine too?
> > > I'm fine with both options, I'm just trying to apply the policy
> > > we agreed on. Does this case deserve an exception?
> > >
> >
> > While it may be a good candidate, I'm not sure how much making an
> exception
> > for it really matters. I'd be tempted to just mark it experimental and
> then
> > have it stable for the 19.11 release. What do we really lose by waiting a
> > release to stabilize it?
>
> I would agree Bruce.
> If no more comment, I will wait for a v5 of this series.
>

I agree that there is no reason we make an exception for those 2 new ones.

But to me the existing rte_lcore_index and rte_lcore_to_socket_id must be
marked as stable.
This is to avoid breaking existing users that did not set
ALLOW_EXPERIMENTAL_API.

I will prepare a v5 later.


-- 
David Marchand


[dpdk-dev] [PATCH] cryptodev: free memzone when releasing cryptodev

2019-05-30 Thread Junxiao Shi
When a cryptodev is created in a primary process,
rte_cryptodev_data_alloc reserves a memzone.
However, this memzone was not released when the cryptodev
is uninitialized. After that, new cryptodev cannot be
created due to memzone name conflict.

This commit frees the memzone when a cryptodev is
uninitialized, fixing this bug. This approach is chosen
instead of keeping and reusing the old memzone, because
the new cryptodev could belong to a different NUMA socket.

Also, rte_cryptodev_data pointer is now properly recorded
in cryptodev_globals.data array.

Bugzilla ID: 105

Signed-off-by: Junxiao Shi 
---
 lib/librte_cryptodev/rte_cryptodev.c | 44 +++-
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index 00c2cf4..666dfea 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -653,6 +653,31 @@ rte_cryptodev_data_alloc(uint8_t dev_id, struct 
rte_cryptodev_data **data,
return 0;
 }
 
+static inline int
+rte_cryptodev_data_free(uint8_t dev_id, struct rte_cryptodev_data **data)
+{
+   char mz_name[RTE_CRYPTODEV_NAME_MAX_LEN];
+   const struct rte_memzone *mz;
+   int n;
+
+   /* generate memzone name */
+   n = snprintf(mz_name, sizeof(mz_name), "rte_cryptodev_data_%u", dev_id);
+   if (n >= (int)sizeof(mz_name))
+   return -EINVAL;
+
+   mz = rte_memzone_lookup(mz_name);
+   if (mz == NULL)
+   return -ENOMEM;
+
+   RTE_ASSERT(*data == mz->addr);
+   *data = NULL;
+
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   return rte_memzone_free(mz);
+
+   return 0;
+}
+
 static uint8_t
 rte_cryptodev_find_free_device_index(void)
 {
@@ -687,16 +712,16 @@ rte_cryptodev_pmd_allocate(const char *name, int 
socket_id)
cryptodev = rte_cryptodev_pmd_get_dev(dev_id);
 
if (cryptodev->data == NULL) {
-   struct rte_cryptodev_data *cryptodev_data =
-   cryptodev_globals.data[dev_id];
+   struct rte_cryptodev_data **cryptodev_data =
+   &cryptodev_globals.data[dev_id];
 
-   int retval = rte_cryptodev_data_alloc(dev_id, &cryptodev_data,
+   int retval = rte_cryptodev_data_alloc(dev_id, cryptodev_data,
socket_id);
 
-   if (retval < 0 || cryptodev_data == NULL)
+   if (retval < 0 || *cryptodev_data == NULL)
return NULL;
 
-   cryptodev->data = cryptodev_data;
+   cryptodev->data = *cryptodev_data;
 
strlcpy(cryptodev->data->name, name,
RTE_CRYPTODEV_NAME_MAX_LEN);
@@ -724,13 +749,20 @@ rte_cryptodev_pmd_release_device(struct rte_cryptodev 
*cryptodev)
if (cryptodev == NULL)
return -EINVAL;
 
+   uint8_t dev_id = cryptodev->data->dev_id;
+
/* Close device only if device operations have been set */
if (cryptodev->dev_ops) {
-   ret = rte_cryptodev_close(cryptodev->data->dev_id);
+   ret = rte_cryptodev_close(dev_id);
if (ret < 0)
return ret;
}
 
+   struct rte_cryptodev_data **cryptodev_data = 
&cryptodev_globals.data[dev_id];
+   ret = rte_cryptodev_data_free(dev_id, cryptodev_data);
+   if (ret < 0)
+   return ret;
+
cryptodev->attached = RTE_CRYPTODEV_DETACHED;
cryptodev_globals.nb_devs--;
return 0;
-- 
2.7.4



[dpdk-dev] eal/pci: Improve automatic selection of IOVA mode

2019-05-30 Thread Ben Walker
In SPDK, not all drivers are registered with DPDK at start up time.
Previously, that meant DPDK always chose to set itself up in IOVA_PA
mode. Instead, when the correct iova choice is unclear based on the
devices and drivers known to DPDK at start up time, use other heuristics
(such as whether /proc/self/pagemap is accessible) to make a better
choice.

This enables SPDK to run as an unprivileged user again without requiring
users to explicitly set the iova mode on the command line.




[dpdk-dev] [PATCH 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
This is in preparation for future simplifications. The
functions are simply inlined for now.

Signed-off-by: Ben Walker 
Change-Id: I129992c9b44f4575a28cc05b78297e15b6be4249
---
 drivers/bus/pci/linux/pci.c | 176 +++-
 1 file changed, 71 insertions(+), 105 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index c99d523f0..d3177916a 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -497,86 +497,6 @@ rte_pci_scan(void)
return -1;
 }
 
-/*
- * Is pci device bound to any kdrv
- */
-static inline int
-pci_one_device_is_bound(void)
-{
-   struct rte_pci_device *dev = NULL;
-   int ret = 0;
-
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   if (dev->kdrv == RTE_KDRV_UNKNOWN ||
-   dev->kdrv == RTE_KDRV_NONE) {
-   continue;
-   } else {
-   ret = 1;
-   break;
-   }
-   }
-   return ret;
-}
-
-/*
- * Any one of the device bound to uio
- */
-static inline int
-pci_one_device_bound_uio(void)
-{
-   struct rte_pci_device *dev = NULL;
-   struct rte_devargs *devargs;
-   int need_check;
-
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   devargs = dev->device.devargs;
-
-   need_check = 0;
-   switch (rte_pci_bus.bus.conf.scan_mode) {
-   case RTE_BUS_SCAN_WHITELIST:
-   if (devargs && devargs->policy == RTE_DEV_WHITELISTED)
-   need_check = 1;
-   break;
-   case RTE_BUS_SCAN_UNDEFINED:
-   case RTE_BUS_SCAN_BLACKLIST:
-   if (devargs == NULL ||
-   devargs->policy != RTE_DEV_BLACKLISTED)
-   need_check = 1;
-   break;
-   }
-
-   if (!need_check)
-   continue;
-
-   if (dev->kdrv == RTE_KDRV_IGB_UIO ||
-  dev->kdrv == RTE_KDRV_UIO_GENERIC) {
-   return 1;
-   }
-   }
-   return 0;
-}
-
-/*
- * Any one of the device has iova as va
- */
-static inline int
-pci_one_device_has_iova_va(void)
-{
-   struct rte_pci_device *dev = NULL;
-   struct rte_pci_driver *drv = NULL;
-
-   FOREACH_DRIVER_ON_PCIBUS(drv) {
-   if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   if (dev->kdrv == RTE_KDRV_VFIO &&
-   rte_pci_match(drv, dev))
-   return 1;
-   }
-   }
-   }
-   return 0;
-}
-
 #if defined(RTE_ARCH_X86)
 static bool
 pci_one_device_iommu_support_va(struct rte_pci_device *dev)
@@ -641,14 +561,76 @@ pci_one_device_iommu_support_va(__rte_unused struct 
rte_pci_device *dev)
 #endif
 
 /*
- * All devices IOMMUs support VA as IOVA
+ * Get iommu class of PCI devices on the bus.
  */
-static bool
-pci_devices_iommu_support_va(void)
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
 {
+   bool is_bound = false;
+   bool is_vfio_noiommu_enabled = true;
+   bool has_iova_va = false;
+   bool is_bound_uio = false;
+   bool iommu_no_va = false;
+   bool break_out;
+   bool need_check;
struct rte_pci_device *dev = NULL;
struct rte_pci_driver *drv = NULL;
+   struct rte_devargs *devargs;
+
+   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+   dev->kdrv == RTE_KDRV_NONE) {
+   continue;
+   } else {
+   is_bound = true;
+   break;
+   }
+   }
+   if (!is_bound)
+   return RTE_IOVA_DC;
 
+   FOREACH_DRIVER_ON_PCIBUS(drv) {
+   if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   if (dev->kdrv == RTE_KDRV_VFIO &&
+   rte_pci_match(drv, dev)) {
+   has_iova_va = true;
+   break;
+   }
+   }
+
+   if (has_iova_va)
+   break;
+   }
+   }
+
+   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   devargs = dev->device.devargs;
+
+   need_check = false;
+   switch (rte_pci_bus.bus.conf.scan_mode) {
+   case RTE_BUS_SCAN_WHITELIST:
+   if (devargs && devargs->policy == RTE_DEV_WHITELISTED)
+   need_check = true;
+   break;
+   case RTE_BUS_SCAN_UNDEFINED:
+   case RTE_BUS_SCAN_BLACKLIST:
+   if (

[dpdk-dev] [PATCH 05/12] eal/pci: Add function pci_ignore_device

2019-05-30 Thread Ben Walker
This performs a check for whether the device should be ignored
due to whitelist or blacklist. This check eventually needs
to apply to all of the other checks in rte_pci_get_iommu_class.

Signed-off-by: Ben Walker 
Change-Id: I8e63e4c2e4199f34561ea1d911e13d6d74a47322
---
 drivers/bus/pci/linux/pci.c | 44 +
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index b7a66d717..f269b6a64 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -560,6 +560,29 @@ pci_one_device_iommu_support_va(__rte_unused struct 
rte_pci_device *dev)
 }
 #endif
 
+static bool
+pci_ignore_device(struct rte_pci_device *dev)
+{
+   struct rte_devargs *devargs;
+
+   devargs = dev->device.devargs;
+
+   switch (rte_pci_bus.bus.conf.scan_mode) {
+   case RTE_BUS_SCAN_WHITELIST:
+   if (devargs && devargs->policy == RTE_DEV_WHITELISTED)
+   return false;
+   break;
+   case RTE_BUS_SCAN_UNDEFINED:
+   case RTE_BUS_SCAN_BLACKLIST:
+   if (devargs == NULL ||
+   devargs->policy != RTE_DEV_BLACKLISTED)
+   return false;
+   break;
+   }
+
+   return true;
+}
+
 /*
  * Get iommu class of PCI devices on the bus.
  */
@@ -571,10 +594,9 @@ rte_pci_get_iommu_class(void)
bool has_iova_va = false;
bool is_bound_uio = false;
bool iommu_no_va = false;
-   bool need_check;
struct rte_pci_device *dev = NULL;
struct rte_pci_driver *drv = NULL;
-   struct rte_devargs *devargs;
+
 
FOREACH_DEVICE_ON_PCIBUS(dev) {
if (dev->kdrv == RTE_KDRV_UNKNOWN ||
@@ -612,23 +634,7 @@ rte_pci_get_iommu_class(void)
}
 
FOREACH_DEVICE_ON_PCIBUS(dev) {
-   devargs = dev->device.devargs;
-
-   need_check = false;
-   switch (rte_pci_bus.bus.conf.scan_mode) {
-   case RTE_BUS_SCAN_WHITELIST:
-   if (devargs && devargs->policy == RTE_DEV_WHITELISTED)
-   need_check = true;
-   break;
-   case RTE_BUS_SCAN_UNDEFINED:
-   case RTE_BUS_SCAN_BLACKLIST:
-   if (devargs == NULL ||
-   devargs->policy != RTE_DEV_BLACKLISTED)
-   need_check = true;
-   break;
-   }
-
-   if (!need_check)
+   if (pci_ignore_device(dev))
continue;
 
if (dev->kdrv == RTE_KDRV_IGB_UIO ||
-- 
2.20.1



[dpdk-dev] [PATCH 07/12] eal/pci: Reverse if check in rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
It's simpler to reverse the if statement here, especially
with an upcoming simplification.

Signed-off-by: Ben Walker 
Change-Id: I6cff80231032304f3f865fdf38157554fad7fd07
---
 drivers/bus/pci/linux/pci.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index ebe62f140..f678d2318 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -601,10 +601,8 @@ rte_pci_get_iommu_class(void)
if (pci_ignore_device(dev))
continue;
 
-   if (dev->kdrv == RTE_KDRV_UNKNOWN ||
-   dev->kdrv == RTE_KDRV_NONE) {
-   continue;
-   } else {
+   if (dev->kdrv != RTE_KDRV_UNKNOWN &&
+   dev->kdrv != RTE_KDRV_NONE) {
is_bound = true;
break;
}
-- 
2.20.1



[dpdk-dev] [PATCH 11/12] eal/pci: rte_pci_get_iommu_class handles no drivers

2019-05-30 Thread Ben Walker
In the case where no drivers are registered with the system,
rte_pci_get_iommu_class should return RTE_IOVA_DC.

Signed-off-by: Ben Walker 
Change-Id: Ia5b0cae100cfcfe46a9e4996328f9746ce33cfd3
---
 drivers/bus/pci/linux/pci.c | 79 ++---
 1 file changed, 38 insertions(+), 41 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 09af66571..abc21061f 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -589,49 +589,68 @@ pci_ignore_device(struct rte_pci_device *dev)
 enum rte_iova_mode
 rte_pci_get_iommu_class(void)
 {
-   bool is_bound = false;
-   bool is_vfio_noiommu_enabled = true;
-   bool has_iova_va = false;
-   bool is_bound_uio = false;
-   bool iommu_no_va = false;
-   struct rte_pci_device *dev = NULL;
-   struct rte_pci_driver *drv = NULL;
+   struct rte_pci_device *dev;
+   struct rte_pci_driver *drv;
+   struct rte_pci_addr *addr;
+   enum rte_iova_mode iova_mode;
+
+   iova_mode = RTE_IOVA_DC;
 
FOREACH_DEVICE_ON_PCIBUS(dev) {
if (pci_ignore_device(dev))
continue;
 
+   addr = &dev->addr;
+
switch (dev->kdrv) {
case RTE_KDRV_UNKNOWN:
case RTE_KDRV_NONE:
break;
case RTE_KDRV_VFIO:
-   is_bound = true;
FOREACH_DRIVER_ON_PCIBUS(drv) {
if (!rte_pci_match(drv, dev))
continue;
 
-   /*
-   * just one PCI device needs to be checked out 
because
-   * the IOMMU hardware is the same for all of 
them.
-   */
-   iommu_no_va = 
!pci_one_device_iommu_support_va(dev);
+   if ((drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) 
== 0)
+   continue;
 
-   if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
-   has_iova_va = true;
-   break;
+   if (!pci_one_device_iommu_support_va(dev)) {
+   RTE_LOG(WARNING, EAL, "Device " 
PCI_PRI_FMT " wanted IOVA as VA, but ",
+   addr->domain, addr->bus, 
addr->devid, addr->function);
+   RTE_LOG(WARNING, EAL, "IOMMU does not 
support it.\n");
+   iova_mode = RTE_IOVA_PA;
+   }
+#ifdef VFIO_PRESENT
+   else if (rte_vfio_noiommu_is_enabled()) {
+   RTE_LOG(WARNING, EAL, "Device " 
PCI_PRI_FMT " wanted IOVA as VA, but ",
+   addr->domain, addr->bus, 
addr->devid, addr->function);
+   RTE_LOG(WARNING, EAL, "vfio-noiommu is 
enabled.\n");
+   iova_mode = RTE_IOVA_PA;
+#endif
+   } else if (iova_mode == RTE_IOVA_PA) {
+   RTE_LOG(WARNING, EAL, "Device " 
PCI_PRI_FMT " wanted IOVA as VA, but ",
+   addr->domain, addr->bus, 
addr->devid, addr->function);
+   RTE_LOG(WARNING, EAL, "other devices 
require PA.\n");
+   } else {
+   iova_mode = RTE_IOVA_VA;
}
}
break;
case RTE_KDRV_IGB_UIO:
case RTE_KDRV_UIO_GENERIC:
case RTE_KDRV_NIC_UIO:
-   is_bound = true;
FOREACH_DRIVER_ON_PCIBUS(drv) {
if (!rte_pci_match(drv, dev))
continue;
 
-   is_bound_uio = true;
+   if (iova_mode == RTE_IOVA_VA) {
+   RTE_LOG(WARNING, EAL, "Some devices 
wanted IOVA as VA, but ");
+   RTE_LOG(WARNING, EAL, "device " 
PCI_PRI_FMT " requires PA.\n",
+   addr->domain, addr->bus, 
addr->devid, addr->function);
+
+   }
+
+   iova_mode = RTE_IOVA_PA;
break;
}
break;
@@ -639,29 +658,7 @@ rte_pci_get_iommu_class(void)
}
}
 
-   if (!is_bound)
-   return RTE_IOVA_DC;
-
-#ifdef VFIO_PRESENT
-   is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ?
-  

[dpdk-dev] [PATCH 03/12] eal/pci: Rework loops in rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
Make all of the loops first iterate over devices, then
drivers. This is in preparation for combining them
into a single loop.

Signed-off-by: Ben Walker 
Change-Id: Ifb2bfcc60570a5d5a13481be3da0fc74bf00ef1f
---
 drivers/bus/pci/linux/pci.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index d3177916a..70815e4f0 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -589,10 +589,10 @@ rte_pci_get_iommu_class(void)
if (!is_bound)
return RTE_IOVA_DC;
 
-   FOREACH_DRIVER_ON_PCIBUS(drv) {
-   if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   if (dev->kdrv == RTE_KDRV_VFIO &&
+   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   if (dev->kdrv == RTE_KDRV_VFIO) {
+   FOREACH_DRIVER_ON_PCIBUS(drv) {
+   if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA &&
rte_pci_match(drv, dev)) {
has_iova_va = true;
break;
@@ -631,8 +631,8 @@ rte_pci_get_iommu_class(void)
}
 
break_out = false;
-   FOREACH_DRIVER_ON_PCIBUS(drv) {
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   FOREACH_DRIVER_ON_PCIBUS(drv) {
if (!rte_pci_match(drv, dev))
continue;
/*
-- 
2.20.1



[dpdk-dev] [PATCH 01/12] eal: Make rte_eal_using_phys_addrs work sooner

2019-05-30 Thread Ben Walker
This function only returned the correct answer after
a call to initialize the memory subsystem. Make it work
prior to that.

Signed-off-by: Ben Walker 
Change-Id: I8f3c5128fbf5da884a956bbcc72c5a13564825d5
---
 lib/librte_eal/linux/eal/eal_memory.c | 63 ---
 1 file changed, 28 insertions(+), 35 deletions(-)

diff --git a/lib/librte_eal/linux/eal/eal_memory.c 
b/lib/librte_eal/linux/eal/eal_memory.c
index 416dad898..0c07bb946 100644
--- a/lib/librte_eal/linux/eal/eal_memory.c
+++ b/lib/librte_eal/linux/eal/eal_memory.c
@@ -66,34 +66,8 @@
  * zone as well as a physical contiguous zone.
  */
 
-static bool phys_addrs_available = true;
-
 #define RANDOMIZE_VA_SPACE_FILE "/proc/sys/kernel/randomize_va_space"
 
-static void
-test_phys_addrs_available(void)
-{
-   uint64_t tmp = 0;
-   phys_addr_t physaddr;
-
-   if (!rte_eal_has_hugepages()) {
-   RTE_LOG(ERR, EAL,
-   "Started without hugepages support, physical addresses 
not available\n");
-   phys_addrs_available = false;
-   return;
-   }
-
-   physaddr = rte_mem_virt2phy(&tmp);
-   if (physaddr == RTE_BAD_PHYS_ADDR) {
-   if (rte_eal_iova_mode() == RTE_IOVA_PA)
-   RTE_LOG(ERR, EAL,
-   "Cannot obtain physical addresses: %s. "
-   "Only vfio will function.\n",
-   strerror(errno));
-   phys_addrs_available = false;
-   }
-}
-
 /*
  * Get physical address of any mapped virtual address in the current process.
  */
@@ -107,7 +81,7 @@ rte_mem_virt2phy(const void *virtaddr)
off_t offset;
 
/* Cannot parse /proc/self/pagemap, no need to log errors everywhere */
-   if (!phys_addrs_available)
+   if (!rte_eal_using_phys_addrs())
return RTE_BAD_IOVA;
 
/* standard page size */
@@ -1336,8 +1310,6 @@ eal_legacy_hugepage_init(void)
int nr_hugefiles, nr_hugepages = 0;
void *addr;
 
-   test_phys_addrs_available();
-
memset(used_hp, 0, sizeof(used_hp));
 
/* get pointer to global configuration */
@@ -1516,7 +1488,7 @@ eal_legacy_hugepage_init(void)
continue;
}
 
-   if (phys_addrs_available &&
+   if (rte_eal_using_phys_addrs() &&
rte_eal_iova_mode() != RTE_IOVA_VA) {
/* find physical addresses for each hugepage */
if (find_physaddrs(&tmp_hp[hp_offset], hpi) < 0) {
@@ -1735,8 +1707,6 @@ eal_hugepage_init(void)
uint64_t memory[RTE_MAX_NUMA_NODES];
int hp_sz_idx, socket_id;
 
-   test_phys_addrs_available();
-
memset(used_hp, 0, sizeof(used_hp));
 
for (hp_sz_idx = 0;
@@ -1879,8 +1849,6 @@ eal_legacy_hugepage_attach(void)
"into secondary processes\n");
}
 
-   test_phys_addrs_available();
-
fd_hugepage = open(eal_hugepage_data_path(), O_RDONLY);
if (fd_hugepage < 0) {
RTE_LOG(ERR, EAL, "Could not open %s\n",
@@ -2020,7 +1988,32 @@ rte_eal_hugepage_attach(void)
 int
 rte_eal_using_phys_addrs(void)
 {
-   return phys_addrs_available;
+   static int using_phys_addrs = -1;
+   uint64_t tmp = 0;
+   phys_addr_t physaddr;
+
+   if (using_phys_addrs != -1)
+   return using_phys_addrs;
+
+   /* Set the default to 1 */
+   using_phys_addrs = 1;
+
+   if (!rte_eal_has_hugepages()) {
+   RTE_LOG(ERR, EAL,
+   "Started without hugepages support, physical addresses 
not available\n");
+   using_phys_addrs = 0;
+   return using_phys_addrs;
+   }
+
+   physaddr = rte_mem_virt2phy(&tmp);
+   if (physaddr == RTE_BAD_PHYS_ADDR) {
+   if (rte_eal_iova_mode() == RTE_IOVA_PA)
+   RTE_LOG(ERR, EAL,
+   "Cannot obtain physical addresses. Only vfio 
will function.\n");
+   using_phys_addrs = 0;
+   }
+
+   return using_phys_addrs;
 }
 
 static int __rte_unused
-- 
2.20.1



[dpdk-dev] [PATCH 06/12] eal/pci: Correctly test whitelist/blacklist in rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
All of the checks should respect the white and black lists.

Signed-off-by: Ben Walker 
Change-Id: Ie66176bea49987d1fc0a03dbee2638d9dd6efbc5
---
 drivers/bus/pci/linux/pci.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index f269b6a64..ebe62f140 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -597,8 +597,10 @@ rte_pci_get_iommu_class(void)
struct rte_pci_device *dev = NULL;
struct rte_pci_driver *drv = NULL;
 
-
FOREACH_DEVICE_ON_PCIBUS(dev) {
+   if (pci_ignore_device(dev))
+   continue;
+
if (dev->kdrv == RTE_KDRV_UNKNOWN ||
dev->kdrv == RTE_KDRV_NONE) {
continue;
@@ -611,6 +613,9 @@ rte_pci_get_iommu_class(void)
return RTE_IOVA_DC;
 
FOREACH_DEVICE_ON_PCIBUS(dev) {
+   if (pci_ignore_device(dev))
+   continue;
+
if (dev->kdrv == RTE_KDRV_VFIO) {
FOREACH_DRIVER_ON_PCIBUS(drv) {
if (!rte_pci_match(drv, dev))
-- 
2.20.1



[dpdk-dev] [PATCH 09/12] eal/pci: Simplify rte_pci_get_iommu class by using a switch

2019-05-30 Thread Ben Walker
Take several independent if statements and convert to a
switch statement.

Signed-off-by: Ben Walker 
Change-Id: Ia77c88ea484b529e8b0c9e09e8ef22cf3210e669
---
 drivers/bus/pci/linux/pci.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 11e2e4d1b..41fd82988 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -601,12 +601,12 @@ rte_pci_get_iommu_class(void)
if (pci_ignore_device(dev))
continue;
 
-   if (dev->kdrv != RTE_KDRV_UNKNOWN &&
-   dev->kdrv != RTE_KDRV_NONE) {
+   switch (dev->kdrv) {
+   case RTE_KDRV_UNKNOWN:
+   case RTE_KDRV_NONE:
+   break;
+   case RTE_KDRV_VFIO:
is_bound = true;
-   }
-
-   if (dev->kdrv == RTE_KDRV_VFIO) {
FOREACH_DRIVER_ON_PCIBUS(drv) {
if (!rte_pci_match(drv, dev))
continue;
@@ -622,11 +622,14 @@ rte_pci_get_iommu_class(void)
break;
}
}
-   }
-
-   if (dev->kdrv == RTE_KDRV_IGB_UIO ||
-  dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+   break;
+   case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_UIO_GENERIC:
+   case RTE_KDRV_NIC_UIO:
+   is_bound = true;
is_bound_uio = true;
+   break;
+
}
}
 
-- 
2.20.1



[dpdk-dev] [PATCH 04/12] eal/pci: Collapse two loops in rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
Two of these loops easily collapse into a single loop.
This sets the stage for future simplifications.

Signed-off-by: Ben Walker 
Change-Id: I3353f2e3585808cebff3f11805f96e4a1cc7fb3a
---
 drivers/bus/pci/linux/pci.c | 31 ++-
 1 file changed, 10 insertions(+), 21 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 70815e4f0..b7a66d717 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -571,7 +571,6 @@ rte_pci_get_iommu_class(void)
bool has_iova_va = false;
bool is_bound_uio = false;
bool iommu_no_va = false;
-   bool break_out;
bool need_check;
struct rte_pci_device *dev = NULL;
struct rte_pci_driver *drv = NULL;
@@ -592,8 +591,16 @@ rte_pci_get_iommu_class(void)
FOREACH_DEVICE_ON_PCIBUS(dev) {
if (dev->kdrv == RTE_KDRV_VFIO) {
FOREACH_DRIVER_ON_PCIBUS(drv) {
-   if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA &&
-   rte_pci_match(drv, dev)) {
+   if (!rte_pci_match(drv, dev))
+   continue;
+
+   /*
+   * just one PCI device needs to be checked out 
because
+   * the IOMMU hardware is the same for all of 
them.
+   */
+   iommu_no_va = 
!pci_one_device_iommu_support_va(dev);
+
+   if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
has_iova_va = true;
break;
}
@@ -630,24 +637,6 @@ rte_pci_get_iommu_class(void)
}
}
 
-   break_out = false;
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   FOREACH_DRIVER_ON_PCIBUS(drv) {
-   if (!rte_pci_match(drv, dev))
-   continue;
-   /*
-* just one PCI device needs to be checked out because
-* the IOMMU hardware is the same for all of them.
-*/
-   iommu_no_va = !pci_one_device_iommu_support_va(dev);
-   break_out = true;
-   break;
-   }
-
-   if (break_out)
-   break;
-   }
-
 #ifdef VFIO_PRESENT
is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ?
true : false;
-- 
2.20.1



[dpdk-dev] [PATCH 12/12] eal: If bus can't decide PA or VA, try to access PA

2019-05-30 Thread Ben Walker
If the bus can't determine a preference for IOVA_PA vs.
IOVA_VA by looking at the devices and drivers, as a
last resort test if physical addresses are even accessible
in /proc/self/pagemap. If they are, use IOVA_PA. If they
are not, use IOVA_VA.

Change-Id: If1eeb723283b80b24bd973987054fdad62f59cbd
---
 lib/librte_eal/common/eal_common_bus.c |  4 
 lib/librte_eal/linux/eal/eal.c | 28 +++---
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_bus.c 
b/lib/librte_eal/common/eal_common_bus.c
index c8f1901f0..77f1be1b4 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -237,10 +237,6 @@ rte_bus_get_iommu_class(void)
mode |= bus->get_iommu_class();
}
 
-   if (mode != RTE_IOVA_VA) {
-   /* Use default IOVA mode */
-   mode = RTE_IOVA_PA;
-   }
return mode;
 }
 
diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
index 161399619..283aae120 100644
--- a/lib/librte_eal/linux/eal/eal.c
+++ b/lib/librte_eal/linux/eal/eal.c
@@ -948,6 +948,7 @@ rte_eal_init(int argc, char **argv)
static char logid[PATH_MAX];
char cpuset[RTE_CPU_AFFINITY_STR_LEN];
char thread_name[RTE_MAX_THREAD_NAME_LEN];
+   enum rte_iova_mode iova_mode;
 
/* checks if the machine is adequate */
if (!rte_cpu_is_supported()) {
@@ -1037,18 +1038,31 @@ rte_eal_init(int argc, char **argv)
 
/* if no EAL option "--iova-mode=", use bus IOVA scheme */
if (internal_config.iova_mode == RTE_IOVA_DC) {
-   /* autodetect the IOVA mapping mode (default is RTE_IOVA_PA) */
-   rte_eal_get_configuration()->iova_mode =
-   rte_bus_get_iommu_class();
+   /* autodetect the IOVA mapping mode */
+   iova_mode = rte_bus_get_iommu_class();
 
/* Workaround for KNI which requires physical address to work */
-   if (rte_eal_get_configuration()->iova_mode == RTE_IOVA_VA &&
+   if (iova_mode == RTE_IOVA_VA &&
rte_eal_check_module("rte_kni") == 1) {
-   rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA;
+   iova_mode = RTE_IOVA_PA;
RTE_LOG(WARNING, EAL,
-   "Some devices want IOVA as VA but PA will be 
used because.. "
-   "KNI module inserted\n");
+   "Some devices want IOVA as VA but PA will be"
+   " used because KNI module inserted\n");
+   }
+
+   if (iova_mode == RTE_IOVA_DC) {
+   /* If the bus doesn't care, check if physical addresses 
are
+* accessible. */
+   if (rte_eal_using_phys_addrs()) {
+   /* Physical addresses are available, so the 
safest
+* choice is to use those. */
+   iova_mode = RTE_IOVA_PA;
+   } else {
+   iova_mode = RTE_IOVA_VA;
+   }
}
+
+   rte_eal_get_configuration()->iova_mode = iova_mode;
} else {
rte_eal_get_configuration()->iova_mode =
internal_config.iova_mode;
-- 
2.20.1



[dpdk-dev] [PATCH 10/12] eal/pci: Finding a device bound to UIO does not force PA

2019-05-30 Thread Ben Walker
If a device is found that is bound to the UIO driver,
only force IOVA_PA if there is a driver registered to use it.

Signed-off-by: Ben Walker 
Change-Id: I8015f11a33ab1b7662bf374d6944eff8d7a74a07
---
 drivers/bus/pci/linux/pci.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 41fd82988..09af66571 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -627,7 +627,13 @@ rte_pci_get_iommu_class(void)
case RTE_KDRV_UIO_GENERIC:
case RTE_KDRV_NIC_UIO:
is_bound = true;
-   is_bound_uio = true;
+   FOREACH_DRIVER_ON_PCIBUS(drv) {
+   if (!rte_pci_match(drv, dev))
+   continue;
+
+   is_bound_uio = true;
+   break;
+   }
break;
 
}
-- 
2.20.1



[dpdk-dev] [PATCH 08/12] eal/pci: Collapse loops in rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
The three loops can now be easily combined into one.

This is slightly less efficient than before because it
doesn't break out early. But that can be addressed
later.

Signed-off-by: Ben Walker 
Change-Id: Ic97155bb478dddbcbeaa6d51947684ffef219a52
---
 drivers/bus/pci/linux/pci.c | 19 +++
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index f678d2318..11e2e4d1b 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -604,15 +604,7 @@ rte_pci_get_iommu_class(void)
if (dev->kdrv != RTE_KDRV_UNKNOWN &&
dev->kdrv != RTE_KDRV_NONE) {
is_bound = true;
-   break;
}
-   }
-   if (!is_bound)
-   return RTE_IOVA_DC;
-
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   if (pci_ignore_device(dev))
-   continue;
 
if (dev->kdrv == RTE_KDRV_VFIO) {
FOREACH_DRIVER_ON_PCIBUS(drv) {
@@ -630,15 +622,7 @@ rte_pci_get_iommu_class(void)
break;
}
}
-
-   if (has_iova_va)
-   break;
}
-   }
-
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   if (pci_ignore_device(dev))
-   continue;
 
if (dev->kdrv == RTE_KDRV_IGB_UIO ||
   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
@@ -646,6 +630,9 @@ rte_pci_get_iommu_class(void)
}
}
 
+   if (!is_bound)
+   return RTE_IOVA_DC;
+
 #ifdef VFIO_PRESENT
is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ?
true : false;
-- 
2.20.1



Re: [dpdk-dev] [PATCH 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class

2019-05-30 Thread Stephen Hemminger
On Thu, 30 May 2019 10:48:09 -0700
Ben Walker  wrote:

> This is in preparation for future simplifications. The
> functions are simply inlined for now.
> 
> Signed-off-by: Ben Walker 
> Change-Id: I129992c9b44f4575a28cc05b78297e15b6be4249

Please don't inline any functions that are not in the fast path.
The compiler will do it anyway.



Re: [dpdk-dev] [PATCH 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class

2019-05-30 Thread Walker, Benjamin
> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Thursday, May 30, 2019 10:57 AM
> To: Walker, Benjamin 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 02/12] eal/pci: Inline several functions into
> rte_pci_get_iommu_class
> 
> On Thu, 30 May 2019 10:48:09 -0700
> Ben Walker  wrote:
> 
> > This is in preparation for future simplifications. The functions are
> > simply inlined for now.
> >
> > Signed-off-by: Ben Walker 
> > Change-Id: I129992c9b44f4575a28cc05b78297e15b6be4249
> 
> Please don't inline any functions that are not in the fast path.
> The compiler will do it anyway.

That's not what I mean by inline. I didn't mark the functions inline - I copied
their contents into the single place they are called. This patch is a set up 
patch
for a later one that refactors the way this function works, and doing this makes
the diff easier to read.



Re: [dpdk-dev] [PATCH v4 2/5] eal: add lcore accessors

2019-05-30 Thread Bruce Richardson
On Thu, May 30, 2019 at 07:00:36PM +0200, David Marchand wrote:
>On Thu, May 30, 2019 at 3:39 PM Thomas Monjalon
><[1]tho...@monjalon.net> wrote:
> 
>  30/05/2019 12:11, Bruce Richardson:
>  > On Thu, May 30, 2019 at 09:40:08AM +0200, Thomas Monjalon wrote:
>  > > 30/05/2019 09:31, David Marchand:
>  > > > On Thu, May 30, 2019 at 12:51 AM Stephen Hemminger <
>  > > > [2]step...@networkplumber.org> wrote:
>  > > >
>  > > > > On Thu, 30 May 2019 00:46:30 +0200
>  > > > > Thomas Monjalon <[3]tho...@monjalon.net> wrote:
>  > > > >
>  > > > > > 23/05/2019 15:58, David Marchand:
>  > > > > > > From: Stephen Hemminger <[4]step...@networkplumber.org>
>  > > > > > >
>  > > > > > > The fields of the internal EAL core configuration are
>  currently
>  > > > > > > laid bare as part of the API. This is not good practice
>  and limits
>  > > > > > > fixing issues with layout and sizes.
>  > > > > > >
>  > > > > > > Make new accessor functions for the fields used by
>  current drivers
>  > > > > > > and examples.
>  > > > > > [...]
>  > > > > > > +DPDK_19.08 {
>  > > > > > > +   global:
>  > > > > > > +
>  > > > > > > +   rte_lcore_cpuset;
>  > > > > > > +   rte_lcore_index;
>  > > > > > > +   rte_lcore_to_cpu_id;
>  > > > > > > +   rte_lcore_to_socket_id;
>  > > > > > > +
>  > > > > > > +} DPDK_19.05;
>  > > > > > > +
>  > > > > > >  EXPERIMENTAL {
>  > > > > > > global:
>  > > > > >
>  > > > > > Just to make sure, are we OK to introduce these functions
>  > > > > > as non-experimental?
>  > > > >
>  > > > > They were in previous releases as inlines this patch
>  converts them
>  > > > > to real functions.
>  > > > >
>  > > > >
>  > > > Well, yes and no.
>  > > >
>  > > > rte_lcore_index and rte_lcore_to_socket_id already existed, so
>  making them
>  > > > part of the ABI is fine for me.
>  > > >
>  > > > rte_lcore_to_cpu_id is new but seems quite safe in how it can
>  be used,
>  > > > adding it to the ABI is ok for me.
>  > >
>  > > It is used by DPAA and some test.
>  > > I guess adding as experimental is fine too?
>  > > I'm fine with both options, I'm just trying to apply the policy
>  > > we agreed on. Does this case deserve an exception?
>  > >
>  >
>  > While it may be a good candidate, I'm not sure how much making an
>  exception
>  > for it really matters. I'd be tempted to just mark it experimental
>  and then
>  > have it stable for the 19.11 release. What do we really lose by
>  waiting a
>  > release to stabilize it?
>  I would agree Bruce.
>  If no more comment, I will wait for a v5 of this series.
> 
>I agree that there is no reason we make an exception for those 2 new
>ones.
>But to me the existing rte_lcore_index and rte_lcore_to_socket_id must
>be marked as stable.
>This is to avoid breaking existing users that did not set
>ALLOW_EXPERIMENTAL_API.
>I will prepare a v5 later.
>--
Yes, agreed. Any existing APIs that were already present as static inlines
can go straight to stable when added to the .map file.

/Bruce


[dpdk-dev] [PATCH 0/8] raw/ioat: driver for Intel QuickData Technology

2019-05-30 Thread Bruce Richardson
This patch series adds support for the Intel QuickData Technology
device, part of the Intel I/O Acceleration Technology (Intel I/OAT). It
is a raw device for allowing hardware DMA i.e. data copies in hardware.

Bruce Richardson (8):
  raw/ioat: add initial support for ioat rawdev driver
  usertools/dpdk-devbind.py: add support for IOAT devices
  raw/ioat: add register definition file
  raw/ioat: create device on probe and destroy on release
  raw/ioat: add device info function
  raw/ioat: add configure, start and stop functions
  raw/ioat: add statistics functions
  raw/ioat: add local API to perform copies

 MAINTAINERS |   7 +-
 app/test/Makefile   |   1 +
 app/test/meson.build|   4 +
 app/test/test_ioat_rawdev.c | 269 +
 config/common_armv8a_linux  |   1 +
 config/common_base  |   5 +
 config/defconfig_arm-armv7a-linuxapp-gcc|   1 +
 config/defconfig_ppc_64-power8-linuxapp-gcc |   1 +
 doc/guides/rawdevs/index.rst|   1 +
 doc/guides/rawdevs/ioat_rawdev.rst  | 227 ++
 doc/guides/rel_notes/release_19_08.rst  |  11 +
 drivers/raw/Makefile|   1 +
 drivers/raw/ioat/Makefile   |  29 ++
 drivers/raw/ioat/ioat_rawdev.c  | 310 
 drivers/raw/ioat/meson.build|   9 +
 drivers/raw/ioat/rte_ioat_rawdev.h  | 228 ++
 drivers/raw/ioat/rte_ioat_spec.h| 301 +++
 drivers/raw/ioat/rte_pmd_ioat_version.map   |   4 +
 drivers/raw/meson.build |   3 +-
 mk/rte.app.mk   |   1 +
 usertools/dpdk-devbind.py   |  10 +
 21 files changed, 1422 insertions(+), 2 deletions(-)
 create mode 100644 app/test/test_ioat_rawdev.c
 create mode 100644 doc/guides/rawdevs/ioat_rawdev.rst
 create mode 100644 drivers/raw/ioat/Makefile
 create mode 100644 drivers/raw/ioat/ioat_rawdev.c
 create mode 100644 drivers/raw/ioat/meson.build
 create mode 100644 drivers/raw/ioat/rte_ioat_rawdev.h
 create mode 100644 drivers/raw/ioat/rte_ioat_spec.h
 create mode 100644 drivers/raw/ioat/rte_pmd_ioat_version.map

-- 
2.21.0



[dpdk-dev] [PATCH 1/8] raw/ioat: add initial support for ioat rawdev driver

2019-05-30 Thread Bruce Richardson
Add stubs for ioat rawdev driver support in DPDK, specifically:

  * makefile and meson build hooks
  * initial public header file
  * rawdev main C file, with probe and release functions
  * release note update announcing the driver
  * initial documentation for the new section in the rawdev doc
  * unit test stubs for device unit tests

Signed-off-by: Bruce Richardson 
---
 MAINTAINERS |  7 +-
 app/test/Makefile   |  1 +
 app/test/meson.build|  1 +
 app/test/test_ioat_rawdev.c | 22 +
 config/common_armv8a_linux  |  1 +
 config/common_base  |  5 ++
 config/defconfig_arm-armv7a-linuxapp-gcc|  1 +
 config/defconfig_ppc_64-power8-linuxapp-gcc |  1 +
 doc/guides/rawdevs/index.rst|  1 +
 doc/guides/rawdevs/ioat_rawdev.rst  | 25 ++
 doc/guides/rel_notes/release_19_08.rst  | 11 +++
 drivers/raw/Makefile|  1 +
 drivers/raw/ioat/Makefile   | 28 +++
 drivers/raw/ioat/ioat_rawdev.c  | 93 +
 drivers/raw/ioat/meson.build|  8 ++
 drivers/raw/ioat/rte_ioat_rawdev.h  | 24 ++
 drivers/raw/ioat/rte_pmd_ioat_version.map   |  4 +
 drivers/raw/meson.build |  3 +-
 mk/rte.app.mk   |  1 +
 19 files changed, 236 insertions(+), 2 deletions(-)
 create mode 100644 app/test/test_ioat_rawdev.c
 create mode 100644 doc/guides/rawdevs/ioat_rawdev.rst
 create mode 100644 drivers/raw/ioat/Makefile
 create mode 100644 drivers/raw/ioat/ioat_rawdev.c
 create mode 100644 drivers/raw/ioat/meson.build
 create mode 100644 drivers/raw/ioat/rte_ioat_rawdev.h
 create mode 100644 drivers/raw/ioat/rte_pmd_ioat_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 15d0829c5..b613a1e74 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1042,6 +1042,12 @@ M: Tianfei zhang 
 F: drivers/raw/ifpga_rawdev/
 F: doc/guides/rawdevs/ifpga_rawdev.rst
 
+IOAT Rawdev
+M: Bruce Richardson 
+F: drivers/raw/ioat/
+F: doc/guides/rawdevs/ioat_rawdev.rst
+F: app/test/test_ioat_rawdev.c
+
 NXP DPAA2 QDMA
 M: Nipun Gupta 
 F: drivers/raw/dpaa2_qdma/
@@ -1052,7 +1058,6 @@ M: Nipun Gupta 
 F: drivers/raw/dpaa2_cmdif/
 F: doc/guides/rawdevs/dpaa2_cmdif.rst
 
-
 Packet processing
 -
 
diff --git a/app/test/Makefile b/app/test/Makefile
index 68d6b4fbc..7fbdd0755 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -212,6 +212,7 @@ endif
 
 ifeq ($(CONFIG_RTE_LIBRTE_RAWDEV),y)
 SRCS-y += test_rawdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_IOAT_RAWDEV) += test_ioat_rawdev.c
 endif
 
 SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c
diff --git a/app/test/meson.build b/app/test/meson.build
index 83391cef0..9867619d3 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -52,6 +52,7 @@ test_sources = files('commands.c',
'test_hash_perf.c',
'test_hash_readwrite_lf.c',
'test_interrupts.c',
+   'test_ioat_rawdev.c',
'test_ipsec.c',
'test_kni.c',
'test_kvargs.c',
diff --git a/app/test/test_ioat_rawdev.c b/app/test/test_ioat_rawdev.c
new file mode 100644
index 0..bd1bb2827
--- /dev/null
+++ b/app/test/test_ioat_rawdev.c
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#include "test.h"
+
+#ifndef RTE_LIBRTE_PMD_IOAT_RAWDEV
+
+static int
+test_ioat_rawdev(void) { return TEST_SKIPPED; }
+
+#else
+
+static int
+test_ioat_rawdev(void)
+{
+   return 0;
+}
+
+#endif /* RTE_LIBRTE_PMD_IOAT_RAWDEV */
+
+REGISTER_TEST_COMMAND(ioat_rawdev_autotest, test_ioat_rawdev);
diff --git a/config/common_armv8a_linux b/config/common_armv8a_linux
index 72091de1c..481712ebc 100644
--- a/config/common_armv8a_linux
+++ b/config/common_armv8a_linux
@@ -34,5 +34,6 @@ CONFIG_RTE_ARCH_ARM64_MEMCPY=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
 CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n
 CONFIG_RTE_LIBRTE_AVP_PMD=n
+CONFIG_RTE_LIBRTE_PMD_IOAT_RAWDEV=n
 
 CONFIG_RTE_SCHED_VECTOR=n
diff --git a/config/common_base b/config/common_base
index 6f19ad5d2..2b8db4880 100644
--- a/config/common_base
+++ b/config/common_base
@@ -741,6 +741,11 @@ CONFIG_RTE_LIBRTE_PMD_DPAA2_QDMA_RAWDEV=n
 #
 CONFIG_RTE_LIBRTE_PMD_IFPGA_RAWDEV=y
 
+#
+# Compile PMD for Intel IOAT raw device
+#
+CONFIG_RTE_LIBRTE_PMD_IOAT_RAWDEV=y
+
 #
 # Compile librte_ring
 #
diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
b/config/defconfig_arm-armv7a-linuxapp-gcc
index c9509b274..ee158ef9d 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -54,3 +54,4 @@ CONFIG_RTE_LIBRTE_QEDE_PMD=n
 CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n
 CONFIG_RTE_LIBRTE_AVP_PMD=n
 CONFIG_RTE_LIBRTE_NFP_PMD=n
+CONFIG_RTE_LIBRTE_PMD_IOAT_RAWDEV=n
diff --git a/config/defconfig_ppc_64-power8-linuxapp-gcc 
b/config/defconfig_ppc_64-power8-linuxapp-gcc
index 7e248b755..9f3670ec0 100644
--- a/config/def

[dpdk-dev] [PATCH 2/8] usertools/dpdk-devbind.py: add support for IOAT devices

2019-05-30 Thread Bruce Richardson
In order to allow binding/unbinding of devices for use by the
ioat_rawdev, we need to update the devbind script to add a new class
of device, and add device ids for the specific HW instances.

Signed-off-by: Bruce Richardson 
---
 doc/guides/rawdevs/ioat_rawdev.rst | 11 +++
 usertools/dpdk-devbind.py  | 10 ++
 2 files changed, 21 insertions(+)

diff --git a/doc/guides/rawdevs/ioat_rawdev.rst 
b/doc/guides/rawdevs/ioat_rawdev.rst
index 40ab1b466..99e757498 100644
--- a/doc/guides/rawdevs/ioat_rawdev.rst
+++ b/doc/guides/rawdevs/ioat_rawdev.rst
@@ -23,3 +23,14 @@ configurations.
 
 For builds using ``meson`` and ``ninja``, the driver will be built when the
 target platform is x86-based.
+
+Device Setup
+-
+
+The Intel\ |reg| QuickData Technology HW devices will need to be bound to a
+user-space IO driver for use. The script ``dpdk-devbind.py`` script
+included with DPDK can be used to view the state of the devices and to bind
+them to a suitable DPDK-supported kernel driver. When querying the
+status of the devices, they will appear under the category of "dma
+devices", i.e. the command ``dpdk-devbind.py --status-dev dma`` can be used
+to see the state of those devices alone.
diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 9e79f0d28..bd0d97df3 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -36,11 +36,17 @@
 octeontx2_npa = {'Class': '08', 'Vendor': '177d', 'Device': 'a0fb,a0fc',
   'SVendor': None, 'SDevice': None}
 
+intel_ioat_bdw = {'Class': '08', 'Vendor': '8086', 'Device': 
'6f20,6f21,6f22,6f23,6f24,6f25,6f26,6f27,6f2e,6f2f',
+  'SVendor': None, 'SDevice': None}
+intel_ioat_skx = {'Class': '08', 'Vendor': '8086', 'Device': '2021',
+  'SVendor': None, 'SDevice': None}
+
 network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
 crypto_devices = [encryption_class, intel_processor_class]
 eventdev_devices = [cavium_sso, cavium_tim, octeontx2_sso]
 mempool_devices = [cavium_fpa, octeontx2_npa]
 compress_devices = [cavium_zip]
+dma_devices = [intel_ioat_bdw, intel_ioat_skx]
 
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
@@ -595,6 +601,8 @@ def show_status():
 if status_dev == "compress" or status_dev == "all":
 show_device_status(compress_devices , "Compress")
 
+if status_dev == "dma" or status_dev == "all":
+show_device_status(dma_devices, "DMA")
 
 def parse_args():
 '''Parses the command-line arguments given by the user and takes the
@@ -670,6 +678,7 @@ def do_arg_actions():
 get_device_details(eventdev_devices)
 get_device_details(mempool_devices)
 get_device_details(compress_devices)
+get_device_details(dma_devices)
 show_status()
 
 
@@ -690,6 +699,7 @@ def main():
 get_device_details(eventdev_devices)
 get_device_details(mempool_devices)
 get_device_details(compress_devices)
+get_device_details(dma_devices)
 do_arg_actions()
 
 if __name__ == "__main__":
-- 
2.21.0



[dpdk-dev] [PATCH 3/8] raw/ioat: add register definition file

2019-05-30 Thread Bruce Richardson
Add in the list of registers for the device. File is taken from the SPDK
project:

  https://github.com/spdk/spdk/blob/master/include/spdk/ioat_spec.h

Signed-off-by: Bruce Richardson 
---
 drivers/raw/ioat/Makefile|   1 +
 drivers/raw/ioat/meson.build |   3 +-
 drivers/raw/ioat/rte_ioat_spec.h | 301 +++
 3 files changed, 304 insertions(+), 1 deletion(-)
 create mode 100644 drivers/raw/ioat/rte_ioat_spec.h

diff --git a/drivers/raw/ioat/Makefile b/drivers/raw/ioat/Makefile
index 7726e310a..1e10938f3 100644
--- a/drivers/raw/ioat/Makefile
+++ b/drivers/raw/ioat/Makefile
@@ -24,5 +24,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_IOAT_RAWDEV) += ioat_rawdev.c
 
 # export include files
 SYMLINK-y-include += rte_ioat_rawdev.h
+SYMLINK-y-include += rte_ioat_spec.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/raw/ioat/meson.build b/drivers/raw/ioat/meson.build
index ba7620a68..ca23e23fc 100644
--- a/drivers/raw/ioat/meson.build
+++ b/drivers/raw/ioat/meson.build
@@ -5,4 +5,5 @@ build = dpdk_conf.has('RTE_ARCH_X86')
 sources = files('ioat_rawdev.c')
 deps += ['rawdev', 'bus_pci']
 
-install_headers('rte_ioat_rawdev.h')
+install_headers('rte_ioat_rawdev.h',
+   'rte_ioat_spec.h')
diff --git a/drivers/raw/ioat/rte_ioat_spec.h b/drivers/raw/ioat/rte_ioat_spec.h
new file mode 100644
index 0..305e36ded
--- /dev/null
+++ b/drivers/raw/ioat/rte_ioat_spec.h
@@ -0,0 +1,301 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) Intel Corporation
+ */
+
+/**
+ * \file
+ * I/OAT specification definitions
+ *
+ * Taken from ioat_spec.h from SPDK project, with prefix renames and
+ * other minor changes.
+ */
+
+#ifndef RTE_IOAT_SPEC_H
+#define RTE_IOAT_SPEC_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+
+#define RTE_IOAT_PCI_CHANERR_INT_OFFSET0x180
+
+#define RTE_IOAT_INTRCTRL_MASTER_INT_EN0x01
+
+#define RTE_IOAT_VER_3_00x30
+#define RTE_IOAT_VER_3_30x33
+
+/* DMA Channel Registers */
+#define RTE_IOAT_CHANCTRL_CHANNEL_PRIORITY_MASK0xF000
+#define RTE_IOAT_CHANCTRL_COMPL_DCA_EN 0x0200
+#define RTE_IOAT_CHANCTRL_CHANNEL_IN_USE   0x0100
+#define RTE_IOAT_CHANCTRL_DESCRIPTOR_ADDR_SNOOP_CONTROL0x0020
+#define RTE_IOAT_CHANCTRL_ERR_INT_EN   0x0010
+#define RTE_IOAT_CHANCTRL_ANY_ERR_ABORT_EN 0x0008
+#define RTE_IOAT_CHANCTRL_ERR_COMPLETION_EN0x0004
+#define RTE_IOAT_CHANCTRL_INT_REARM0x0001
+
+/* DMA Channel Capabilities */
+#defineRTE_IOAT_DMACAP_PB  (1 << 0)
+#defineRTE_IOAT_DMACAP_DCA (1 << 4)
+#defineRTE_IOAT_DMACAP_BFILL   (1 << 6)
+#defineRTE_IOAT_DMACAP_XOR (1 << 8)
+#defineRTE_IOAT_DMACAP_PQ  (1 << 9)
+#defineRTE_IOAT_DMACAP_DMA_DIF (1 << 10)
+
+struct rte_ioat_registers {
+   uint8_t chancnt;
+   uint8_t xfercap;
+   uint8_t genctrl;
+   uint8_t intrctrl;
+   uint32_tattnstatus;
+   uint8_t cbver;  /* 0x08 */
+   uint8_t reserved4[0x3]; /* 0x09 */
+   uint16_tintrdelay;  /* 0x0C */
+   uint16_tcs_status;  /* 0x0E */
+   uint32_tdmacapability;  /* 0x10 */
+   uint8_t reserved5[0x6C]; /* 0x14 */
+   uint16_tchanctrl;   /* 0x80 */
+   uint8_t reserved6[0x2]; /* 0x82 */
+   uint8_t chancmd;/* 0x84 */
+   uint8_t reserved3[1];   /* 0x85 */
+   uint16_tdmacount;   /* 0x86 */
+   uint64_tchansts;/* 0x88 */
+   uint64_tchainaddr;  /* 0x90 */
+   uint64_tchancmp;/* 0x98 */
+   uint8_t reserved2[0x8]; /* 0xA0 */
+   uint32_tchanerr;/* 0xA8 */
+   uint32_tchanerrmask;/* 0xAC */
+} __attribute__((packed));
+
+#define RTE_IOAT_CHANCMD_RESET 0x20
+#define RTE_IOAT_CHANCMD_SUSPEND   0x04
+
+#define RTE_IOAT_CHANSTS_STATUS0x7ULL
+#define RTE_IOAT_CHANSTS_ACTIVE0x0
+#define RTE_IOAT_CHANSTS_IDLE  0x1
+#define RTE_IOAT_CHANSTS_SUSPENDED 0x2
+#define RTE_IOAT_CHANSTS_HALTED0x3
+#define RTE_IOAT_CHANSTS_ARMED 0x4
+
+#define RTE_IOAT_CHANSTS_UNAFFILIATED_ERROR0x8ULL
+#define RTE_IOAT_CHANSTS_SOFT_ERROR0x10ULL
+
+#define RTE_IOAT_CHANSTS_COMPLETED_DESCRIPTOR_MASK (~0x3FULL)
+
+#define RTE_IOAT_CHANCMP_ALIGN 8   /* CHANCMP address must 
be 64-bit aligned */
+
+struct rte_ioat_generic_hw_desc {
+   uint32_t size;
+   union {
+   uint32_t control_raw;
+   struct {
+   uint32_t int_enable: 1;
+   uint32_t src_snoop_disable: 1;
+   uint32_t dest_snoop_disable: 1;
+  

[dpdk-dev] [PATCH 4/8] raw/ioat: create device on probe and destroy on release

2019-05-30 Thread Bruce Richardson
Add the create/destroy driver functions so that we can actually allocate
a rawdev and destroy it when done. No rawdev API functions are actually
implemented at this point.

Signed-off-by: Bruce Richardson 
---
 doc/guides/rawdevs/ioat_rawdev.rst | 11 
 drivers/raw/ioat/ioat_rawdev.c | 93 +-
 drivers/raw/ioat/rte_ioat_rawdev.h | 20 +++
 3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rawdevs/ioat_rawdev.rst 
b/doc/guides/rawdevs/ioat_rawdev.rst
index 99e757498..476b0503f 100644
--- a/doc/guides/rawdevs/ioat_rawdev.rst
+++ b/doc/guides/rawdevs/ioat_rawdev.rst
@@ -34,3 +34,14 @@ them to a suitable DPDK-supported kernel driver. When 
querying the
 status of the devices, they will appear under the category of "dma
 devices", i.e. the command ``dpdk-devbind.py --status-dev dma`` can be used
 to see the state of those devices alone.
+
+Device Probing and Initialization
+~~
+
+Once bound to a suitable kernel device driver, the HW devices will be found
+as part of the PCI scan done at application initialization time. No vdev
+parameters need to be passed to create or initialize the device.
+
+Once probed successfully, the device will appear as a ``rawdev``, that is a
+"raw device type" inside DPDK, and can be accessed using APIs from the
+``rte_rawdev`` library.
diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
index d9fc3091a..b6964bccd 100644
--- a/drivers/raw/ioat/ioat_rawdev.c
+++ b/drivers/raw/ioat/ioat_rawdev.c
@@ -2,6 +2,7 @@
  * Copyright(c) 2019 Intel Corporation
  */
 
+#include 
 #include 
 #include 
 
@@ -26,15 +27,101 @@ static struct rte_pci_driver ioat_pmd_drv;
 static int
 ioat_rawdev_create(const char *name, struct rte_pci_device *dev)
 {
-   RTE_SET_USED(name);
-   RTE_SET_USED(dev);
+   static const struct rte_rawdev_ops ioat_rawdev_ops = {
+   };
+
+   struct rte_rawdev *rawdev = NULL;
+   struct rte_ioat_rawdev *ioat = NULL;
+   int ret = 0;
+   int retry = 0;
+
+   if (!name) {
+   IOAT_PMD_ERR("Invalid name of the device!");
+   ret = -EINVAL;
+   goto cleanup;
+   }
+
+   /* Allocate device structure */
+   rawdev = rte_rawdev_pmd_allocate(name, sizeof(struct rte_ioat_rawdev),
+dev->device.numa_node);
+   if (rawdev == NULL) {
+   IOAT_PMD_ERR("Unable to allocate raw device");
+   ret = -EINVAL;
+   goto cleanup;
+   }
+
+   rawdev->dev_ops = &ioat_rawdev_ops;
+   rawdev->device = &dev->device;
+   rawdev->driver_name = dev->device.driver->name;
+
+   ioat = rawdev->dev_private;
+   ioat->rawdev = rawdev;
+   ioat->regs = dev->mem_resource[0].addr;
+   ioat->ring_size = 0;
+   ioat->desc_ring = NULL;
+   ioat->status_addr = rte_malloc_virt2iova(ioat) +
+   offsetof(struct rte_ioat_rawdev, status);
+
+   /* do device initialization - reset and set error behaviour */
+   if (ioat->regs->chancnt != 1)
+   IOAT_PMD_ERR("%s: Channel count == %d\n", __func__,
+   ioat->regs->chancnt);
+
+   if (ioat->regs->chanctrl & 0x100) { /* locked by someone else */
+   IOAT_PMD_WARN("%s: Channel appears locked\n", __func__);
+   ioat->regs->chanctrl = 0;
+   }
+
+   ioat->regs->chancmd = RTE_IOAT_CHANCMD_SUSPEND;
+   rte_delay_ms(1);
+   ioat->regs->chancmd = RTE_IOAT_CHANCMD_RESET;
+   rte_delay_ms(1);
+   while (ioat->regs->chancmd & RTE_IOAT_CHANCMD_RESET) {
+   ioat->regs->chainaddr = 0;
+   rte_delay_ms(1);
+   if (++retry >= 200) {
+   IOAT_PMD_ERR("%s: cannot reset device. CHANCMD=0x%llx, 
CHANSTS=0x%llx, CHANERR=0x%llx\n",
+   __func__,
+   (unsigned long long)ioat->regs->chancmd,
+   (unsigned long long)ioat->regs->chansts,
+   (unsigned long 
long)ioat->regs->chanerr);
+   ret = -EIO;
+   }
+   }
+   ioat->regs->chanctrl = RTE_IOAT_CHANCTRL_ANY_ERR_ABORT_EN |
+   RTE_IOAT_CHANCTRL_ERR_COMPLETION_EN;
+
return 0;
+
+cleanup:
+   if (rawdev)
+   rte_rawdev_pmd_release(rawdev);
+
+   return ret;
 }
 
 static int
 ioat_rawdev_destroy(const char *name)
 {
-   RTE_SET_USED(name);
+   int ret;
+   struct rte_rawdev *rdev;
+
+   if (!name) {
+   IOAT_PMD_ERR("Invalid device name");
+   return -EINVAL;
+   }
+
+   rdev = rte_rawdev_pmd_get_named_dev(name);
+   if (!rdev) {
+   IOAT_PMD_ERR("Invalid device name (%s)", name);
+   return -EINVAL;
+   }
+
+   /* rte_rawdev_close is called by pmd_release *

[dpdk-dev] [PATCH 7/8] raw/ioat: add statistics functions

2019-05-30 Thread Bruce Richardson
Add stats functions to track what is happening in the driver, and put
unit tests to check those.

Signed-off-by: Bruce Richardson 
---
 app/test/test_ioat_rawdev.c| 38 ++
 doc/guides/rawdevs/ioat_rawdev.rst | 14 ++
 drivers/raw/ioat/ioat_rawdev.c | 44 ++
 drivers/raw/ioat/rte_ioat_rawdev.h |  6 
 4 files changed, 102 insertions(+)

diff --git a/app/test/test_ioat_rawdev.c b/app/test/test_ioat_rawdev.c
index 36e97347c..7081f3365 100644
--- a/app/test/test_ioat_rawdev.c
+++ b/app/test/test_ioat_rawdev.c
@@ -24,6 +24,11 @@ run_ioat_tests(int dev_id)
 #define IOAT_TEST_RINGSIZE 512
struct rte_ioat_rawdev_config p = { .ring_size = -1 };
struct rte_rawdev_info info = { .dev_private = &p };
+   struct rte_rawdev_xstats_name *snames = NULL;
+   uint64_t *stats = NULL;
+   unsigned int *ids = NULL;
+   unsigned int nb_xstats;
+   unsigned int i;
 
rte_rawdev_info_get(dev_id, &info);
if (p.ring_size != 0) {
@@ -48,6 +53,39 @@ run_ioat_tests(int dev_id)
printf("Error with rte_rawdev_start()\n");
return -1;
}
+
+   /* allocate memory for xstats names and values */
+   nb_xstats = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+
+   snames = malloc(sizeof(*snames) * nb_xstats);
+   if (snames == NULL) {
+   printf("Error allocating xstat names memory\n");
+   return -1;
+   }
+   rte_rawdev_xstats_names_get(dev_id, snames, nb_xstats);
+
+   ids = malloc(sizeof(*ids) * nb_xstats);
+   if (ids == NULL) {
+   printf("Error allocating xstat ids memory\n");
+   return -1;
+   }
+   for (i = 0; i < nb_xstats; i++)
+   ids[i] = i;
+
+   stats = malloc(sizeof(*stats) * nb_xstats);
+   if (stats == NULL) {
+   printf("Error allocating xstat memory\n");
+   return -1;
+   }
+
+   rte_rawdev_xstats_get(dev_id, ids, stats, nb_xstats);
+   for (i = 0; i < nb_xstats; i++)
+   printf("%s: %"PRIu64"   ", snames[i].name, stats[i]);
+   printf("\n");
+
+   free(snames);
+   free(stats);
+   free(ids);
return 0;
 }
 
diff --git a/doc/guides/rawdevs/ioat_rawdev.rst 
b/doc/guides/rawdevs/ioat_rawdev.rst
index b3fe79033..47f12e95c 100644
--- a/doc/guides/rawdevs/ioat_rawdev.rst
+++ b/doc/guides/rawdevs/ioat_rawdev.rst
@@ -111,3 +111,17 @@ The following code shows how the device is configured in
 
 Once configured, the device can then be made ready for use by calling the
 ``rte_rawdev_start()`` API.
+
+Querying Device Statistics
+~~~
+
+The statistics from the IOAT rawdev device can be got via the xstats
+functions in the ``rte_rawdev`` library, i.e.
+``rte_rawdev_xstats_names_get()``, ``rte_rawdev_xstats_get()`` and
+``rte_rawdev_xstats_by_name_get``. The statistics returned for each device
+instance are:
+
+* ``failed_enqueues``
+* ``successful_enqueues``
+* ``copies_started``
+* ``copies_completed``
diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
index b4b70a1e6..09fbdbf9c 100644
--- a/drivers/raw/ioat/ioat_rawdev.c
+++ b/drivers/raw/ioat/ioat_rawdev.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include "rte_ioat_rawdev.h"
@@ -106,6 +107,47 @@ ioat_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t 
dev_info)
cfg->ring_size = ioat->ring_size;
 }
 
+static const char *xstat_names[] = {
+   "failed_enqueues", "successful_enqueues",
+   "copies_started", "copies_completed"
+};
+
+static int
+ioat_xstats_get(const struct rte_rawdev *dev, const unsigned int ids[],
+   uint64_t values[], unsigned int n)
+{
+   const struct rte_ioat_rawdev *ioat = dev->dev_private;
+   unsigned int i;
+
+   for (i = 0; i < n; i++) {
+   switch (ids[i]){
+   case 0: values[i] = ioat->enqueue_failed; break;
+   case 1: values[i] = ioat->enqueued; break;
+   case 2: values[i] = ioat->started; break;
+   case 3: values[i] = ioat->completed; break;
+   default: values[i] = 0; break;
+   }
+   }
+   return n;
+}
+
+static int
+ioat_xstats_get_names(const struct rte_rawdev *dev,
+   struct rte_rawdev_xstats_name *names,
+   unsigned int size)
+{
+   unsigned int i;
+
+   RTE_SET_USED(dev);
+   if (size < RTE_DIM(xstat_names))
+   return RTE_DIM(xstat_names);
+
+   for (i = 0; i < RTE_DIM(xstat_names); i++)
+   strlcpy(names[i].name, xstat_names[i], sizeof(names[i]));
+
+   return RTE_DIM(xstat_names);
+}
+
 static int
 ioat_rawdev_create(const char *name, struct rte_pci_device *dev)
 {
@@ -114,6 +156,8 @@ ioat_rawdev_create(const char *name, struct rte_pci_device 
*dev)
.dev_start = ioat_dev_start,
  

[dpdk-dev] [PATCH 5/8] raw/ioat: add device info function

2019-05-30 Thread Bruce Richardson
Add in the "info_get" function to the driver, to allow us to query the
device. This allows us to have the unit test pick up the presence of
supported hardware or not.

Signed-off-by: Bruce Richardson 
---
 app/test/meson.build   |  3 +++
 app/test/test_ioat_rawdev.c| 23 
 doc/guides/rawdevs/ioat_rawdev.rst | 34 ++
 drivers/raw/ioat/ioat_rawdev.c | 11 ++
 drivers/raw/ioat/rte_ioat_rawdev.h | 11 ++
 5 files changed, 82 insertions(+)

diff --git a/app/test/meson.build b/app/test/meson.build
index 9867619d3..9fe3ddc89 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -305,6 +305,9 @@ endif
 if dpdk_conf.has('RTE_LIBRTE_KNI')
test_deps += 'kni'
 endif
+if dpdk_conf.has('RTE_LIBRTE_PMD_IOAT_RAWDEV')
+   test_deps += 'pmd_ioat'
+endif
 
 cflags = machine_args
 if cc.has_argument('-Wno-format-truncation')
diff --git a/app/test/test_ioat_rawdev.c b/app/test/test_ioat_rawdev.c
index bd1bb2827..ac1389f6e 100644
--- a/app/test/test_ioat_rawdev.c
+++ b/app/test/test_ioat_rawdev.c
@@ -11,9 +11,32 @@ test_ioat_rawdev(void) { return TEST_SKIPPED; }
 
 #else
 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
 static int
 test_ioat_rawdev(void)
 {
+   const int count = rte_rawdev_count();
+   int i, found = 0;
+
+   printf("Checking %d rawdevs\n", count);
+   for (i = 0; i < count && !found; i++) {
+   struct rte_rawdev_info info = { .dev_private = NULL };
+   found = (rte_rawdev_info_get(i, &info) == 0 &&
+   strcmp(info.driver_name,
+   IOAT_PMD_RAWDEV_NAME_STR) == 0);
+   }
+
+   if (!found) {
+   printf("No IOAT rawdev found, skipping tests\n");
+   return TEST_SKIPPED;
+   }
+
return 0;
 }
 
diff --git a/doc/guides/rawdevs/ioat_rawdev.rst 
b/doc/guides/rawdevs/ioat_rawdev.rst
index 476b0503f..b68cdffc3 100644
--- a/doc/guides/rawdevs/ioat_rawdev.rst
+++ b/doc/guides/rawdevs/ioat_rawdev.rst
@@ -45,3 +45,37 @@ parameters need to be passed to create or initialize the 
device.
 Once probed successfully, the device will appear as a ``rawdev``, that is a
 "raw device type" inside DPDK, and can be accessed using APIs from the
 ``rte_rawdev`` library.
+
+Using IOAT Rawdev Devices
+--
+
+To use the devices from an application, the rawdev API can be used, along
+with definitions taken from the device-specific header file
+``rte_ioat_rawdev.h``. This header is needed to get the definition of
+structure parameters used by some of the rawdev APIs for IOAT rawdev
+devices, as well as providing key functions for using the device for memory
+copies.
+
+Getting Device Information
+~~~
+
+Basic information about each rawdev device can be got using the
+``rte_rawdev_info_get()`` API. For most applications, this API will be
+needed to verify that the rawdev in question is of the expected type. For
+example, the following code in ``test_ioat_rawdev.c`` is used to identify
+the IOAT rawdev device for use for the tests:
+
+.. code-block:: C
+
+for (i = 0; i < count && !found; i++) {
+struct rte_rawdev_info info = { .dev_private = NULL };
+found = (rte_rawdev_info_get(i, &info) == 0 &&
+strcmp(info.driver_name,
+IOAT_PMD_RAWDEV_NAME_STR) == 
0);
+}
+
+When calling the ``rte_rawdev_info_get()`` API for an IOAT rawdev device,
+the ``dev_private`` field in the ``rte_rawdev_info`` struct should either
+be NULL, or else be set to point to a structure of type
+``rte_ioat_rawdev_config``, in which case the size of the configured device
+input ring will be returned in that structure.
diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
index b6964bccd..90bed2810 100644
--- a/drivers/raw/ioat/ioat_rawdev.c
+++ b/drivers/raw/ioat/ioat_rawdev.c
@@ -24,10 +24,21 @@ static struct rte_pci_driver ioat_pmd_drv;
 #define IOAT_PMD_ERR(fmt, args...)IOAT_PMD_LOG(ERR, fmt, ## args)
 #define IOAT_PMD_WARN(fmt, args...)   IOAT_PMD_LOG(WARNING, fmt, ## args)
 
+static void
+ioat_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
+{
+   struct rte_ioat_rawdev_config *cfg = dev_info;
+   struct rte_ioat_rawdev *ioat = dev->dev_private;
+
+   if (cfg != NULL)
+   cfg->ring_size = ioat->ring_size;
+}
+
 static int
 ioat_rawdev_create(const char *name, struct rte_pci_device *dev)
 {
static const struct rte_rawdev_ops ioat_rawdev_ops = {
+   .dev_info_get = ioat_dev_info_get,
};
 
struct rte_rawdev *rawdev = NULL;
diff --git a/drivers/raw/ioat/rte_ioat_rawdev.h 
b/drivers/raw/ioat/rte_ioat_rawdev.h
index c3216a174..7e0d72ca3 100644
--- a/drivers/raw/ioat/rte_ioat_rawdev.h
+++ b/drivers/raw/ioat/rte_ioat_rawdev.h
@@ -2

[dpdk-dev] [PATCH 6/8] raw/ioat: add configure, start and stop functions

2019-05-30 Thread Bruce Richardson
Allow initializing a driver instance.

Signed-off-by: Bruce Richardson 
---
 app/test/test_ioat_rawdev.c| 35 +-
 doc/guides/rawdevs/ioat_rawdev.rst | 32 +
 drivers/raw/ioat/ioat_rawdev.c | 75 ++
 drivers/raw/ioat/rte_ioat_rawdev.h | 14 ++
 4 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/app/test/test_ioat_rawdev.c b/app/test/test_ioat_rawdev.c
index ac1389f6e..36e97347c 100644
--- a/app/test/test_ioat_rawdev.c
+++ b/app/test/test_ioat_rawdev.c
@@ -18,6 +18,39 @@ test_ioat_rawdev(void) { return TEST_SKIPPED; }
 #include 
 #include 
 
+static int
+run_ioat_tests(int dev_id)
+{
+#define IOAT_TEST_RINGSIZE 512
+   struct rte_ioat_rawdev_config p = { .ring_size = -1 };
+   struct rte_rawdev_info info = { .dev_private = &p };
+
+   rte_rawdev_info_get(dev_id, &info);
+   if (p.ring_size != 0) {
+   printf("Error, initial ring size is non-zero (%d)\n",
+   (int)p.ring_size);
+   return -1;
+   }
+
+   p.ring_size = IOAT_TEST_RINGSIZE;
+   if (rte_rawdev_configure(dev_id, &info) != 0) {
+   printf("Error with rte_rawdev_configure()\n");
+   return -1;
+   }
+   rte_rawdev_info_get(dev_id, &info);
+   if (p.ring_size != IOAT_TEST_RINGSIZE) {
+   printf("Error, ring size is not %d (%d)\n",
+   IOAT_TEST_RINGSIZE, (int)p.ring_size);
+   return -1;
+   }
+
+   if (rte_rawdev_start(dev_id) != 0) {
+   printf("Error with rte_rawdev_start()\n");
+   return -1;
+   }
+   return 0;
+}
+
 static int
 test_ioat_rawdev(void)
 {
@@ -37,7 +70,7 @@ test_ioat_rawdev(void)
return TEST_SKIPPED;
}
 
-   return 0;
+   return run_ioat_tests(i);
 }
 
 #endif /* RTE_LIBRTE_PMD_IOAT_RAWDEV */
diff --git a/doc/guides/rawdevs/ioat_rawdev.rst 
b/doc/guides/rawdevs/ioat_rawdev.rst
index b68cdffc3..b3fe79033 100644
--- a/doc/guides/rawdevs/ioat_rawdev.rst
+++ b/doc/guides/rawdevs/ioat_rawdev.rst
@@ -79,3 +79,35 @@ the ``dev_private`` field in the ``rte_rawdev_info`` struct 
should either
 be NULL, or else be set to point to a structure of type
 ``rte_ioat_rawdev_config``, in which case the size of the configured device
 input ring will be returned in that structure.
+
+Device Configuration
+~
+
+Configuring an IOAT rawdev device is done using the
+``rte_rawdev_configure()`` API, which takes the same structure parameters
+as the, previously referenced, ``rte_rawdev_info_get()`` API. The main
+difference is that, because the parameter is used as input rather than
+output, the ``dev_private`` structure element cannot be NULL, and must
+point to a valid ``rte_ioat_rawdev_config`` structure, containing the ring
+size to be used by the device. The ring size must be a power of two,
+between 64 and 4096.
+
+The following code shows how the device is configured in
+``test_ioat_rawdev.c``:
+
+.. code-block:: C
+
+   #define IOAT_TEST_RINGSIZE 512
+struct rte_ioat_rawdev_config p = { .ring_size = -1 };
+struct rte_rawdev_info info = { .dev_private = &p };
+
+/* ... */
+
+p.ring_size = IOAT_TEST_RINGSIZE;
+if (rte_rawdev_configure(dev_id, &info) != 0) {
+printf("Error with rte_rawdev_configure()\n");
+return -1;
+}
+
+Once configured, the device can then be made ready for use by calling the
+``rte_rawdev_start()`` API.
diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
index 90bed2810..b4b70a1e6 100644
--- a/drivers/raw/ioat/ioat_rawdev.c
+++ b/drivers/raw/ioat/ioat_rawdev.c
@@ -24,6 +24,78 @@ static struct rte_pci_driver ioat_pmd_drv;
 #define IOAT_PMD_ERR(fmt, args...)IOAT_PMD_LOG(ERR, fmt, ## args)
 #define IOAT_PMD_WARN(fmt, args...)   IOAT_PMD_LOG(WARNING, fmt, ## args)
 
+#define DESC_SZ sizeof(struct rte_ioat_desc)
+#define COMPLETION_SZ sizeof(__m128i)
+
+static int
+ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
+{
+   struct rte_ioat_rawdev_config *params = config;
+   struct rte_ioat_rawdev *ioat = dev->dev_private;
+   unsigned short i;
+
+   if (dev->started)
+   return -EBUSY;
+
+   if (params == NULL)
+   return -EINVAL;
+
+   if (params->ring_size > 4096 || params->ring_size < 64 ||
+   !rte_is_power_of_2(params->ring_size))
+   return -EINVAL;
+
+   ioat->ring_size = params->ring_size;
+   if (ioat->desc_ring != NULL) {
+   rte_free(ioat->desc_ring);
+   ioat->desc_ring = NULL;
+   }
+
+   /* allocate one block of memory for both descriptors
+* and completion handles.
+*/
+   ioat->desc_ring = rte_zmalloc_socket(NULL,
+   (DESC_SZ + COMPLETION_SZ) * ioat->ring_size,
+   0, /* alignment, de

[dpdk-dev] [PATCH 8/8] raw/ioat: add local API to perform copies

2019-05-30 Thread Bruce Richardson
Add local APIs to trigger data copies, and retrieve handle values once
those copies are completed. Included are unit tests to validate the data
is copies correctly.

Signed-off-by: Bruce Richardson 
---
 app/test/test_ioat_rawdev.c| 159 -
 doc/guides/rawdevs/ioat_rawdev.rst | 100 ++
 drivers/raw/ioat/rte_ioat_rawdev.h | 155 +++-
 3 files changed, 410 insertions(+), 4 deletions(-)

diff --git a/app/test/test_ioat_rawdev.c b/app/test/test_ioat_rawdev.c
index 7081f3365..f2240adec 100644
--- a/app/test/test_ioat_rawdev.c
+++ b/app/test/test_ioat_rawdev.c
@@ -18,6 +18,131 @@ test_ioat_rawdev(void) { return TEST_SKIPPED; }
 #include 
 #include 
 
+static struct rte_mempool *pool;
+
+static int
+test_enqueue_copies(int dev_id)
+{
+   const unsigned int length = 1024;
+   unsigned int i;
+
+   do {
+   struct rte_mbuf *src, *dst;
+   char *src_data, *dst_data;
+   struct rte_mbuf *completed[2] = {0};
+
+   /* test doing a single copy */
+   src = rte_pktmbuf_alloc(pool);
+   dst = rte_pktmbuf_alloc(pool);
+   src->data_len = src->pkt_len = length;
+   dst->data_len = dst->pkt_len = length;
+   src_data = rte_pktmbuf_mtod(src, char *);
+   dst_data = rte_pktmbuf_mtod(dst, char *);
+
+   for (i = 0; i < length; i++)
+   src_data[i] = rand() & 0xFF;
+
+   if (rte_ioat_enqueue_copy(dev_id,
+   src->buf_iova + src->data_off,
+   dst->buf_iova + dst->data_off,
+   length,
+   (uintptr_t)src,
+   (uintptr_t)dst,
+   0 /* no fence */) != 1) {
+   printf("Error with rte_ioat_enqueue_copy\n");
+   return -1;
+   }
+   rte_ioat_do_copies(dev_id);
+   usleep(10);
+
+   if (rte_ioat_completed_copies(dev_id, 1, (void *)&completed[0],
+   (void *)&completed[1]) != 1) {
+   printf("Error with rte_ioat_completed_copies\n");
+   return -1;
+   }
+   if (completed[0] != src || completed[1] != dst) {
+   printf("Error with completions: got (%p, %p), not 
(%p,%p)\n",
+   completed[0], completed[1], src, dst);
+   return -1;
+   }
+
+   for (i = 0; i < length; i++)
+   if (dst_data[i] != src_data[i]) {
+   printf("Data mismatch at char %u\n", i);
+   return -1;
+   }
+   rte_pktmbuf_free(src);
+   rte_pktmbuf_free(dst);
+   } while(0);
+
+   /* test doing multiple copies */
+   do {
+   struct rte_mbuf *srcs[32], *dsts[32];
+   struct rte_mbuf *completed_src[64];
+   struct rte_mbuf *completed_dst[64];
+   unsigned int j;
+
+   for (i = 0; i < RTE_DIM(srcs); i++) {
+   char *src_data;
+
+   srcs[i] = rte_pktmbuf_alloc(pool);
+   dsts[i] = rte_pktmbuf_alloc(pool);
+   srcs[i]->data_len = srcs[i]->pkt_len = length;
+   dsts[i]->data_len = dsts[i]->pkt_len = length;
+   src_data = rte_pktmbuf_mtod(srcs[i], char *);
+
+   for (j = 0; j < length; j++)
+   src_data[j] = rand() & 0xFF;
+
+   if (rte_ioat_enqueue_copy(dev_id,
+   srcs[i]->buf_iova + srcs[i]->data_off,
+   dsts[i]->buf_iova + dsts[i]->data_off,
+   length,
+   (uintptr_t)srcs[i],
+   (uintptr_t)dsts[i],
+   0 /* nofence */) != 1) {
+   printf("Error with rte_ioat_enqueue_copy for 
buffer %u\n",
+   i);
+   return -1;
+   }
+   }
+   rte_ioat_do_copies(dev_id);
+   usleep(100);
+
+   if (rte_ioat_completed_copies(dev_id, 64, (void *)completed_src,
+   (void *)completed_dst) != RTE_DIM(srcs)) {
+   printf("Error with rte_ioat_completed_copies\n");
+   return -1;
+   }
+   for (i = 0; i < RTE_DIM(srcs); i++) {
+   char *src_data, *dst_data;
+
+   if (completed_src[i] != srcs[i]) {
+   pri

[dpdk-dev] [PATCH v2 01/12] eal: Make rte_eal_using_phys_addrs work sooner

2019-05-30 Thread Ben Walker
This function only returned the correct answer after
a call to initialize the memory subsystem. Make it work
prior to that.

Signed-off-by: Ben Walker 
---
 lib/librte_eal/linux/eal/eal_memory.c | 63 ---
 1 file changed, 28 insertions(+), 35 deletions(-)

diff --git a/lib/librte_eal/linux/eal/eal_memory.c 
b/lib/librte_eal/linux/eal/eal_memory.c
index 416dad898..0c07bb946 100644
--- a/lib/librte_eal/linux/eal/eal_memory.c
+++ b/lib/librte_eal/linux/eal/eal_memory.c
@@ -66,34 +66,8 @@
  * zone as well as a physical contiguous zone.
  */
 
-static bool phys_addrs_available = true;
-
 #define RANDOMIZE_VA_SPACE_FILE "/proc/sys/kernel/randomize_va_space"
 
-static void
-test_phys_addrs_available(void)
-{
-   uint64_t tmp = 0;
-   phys_addr_t physaddr;
-
-   if (!rte_eal_has_hugepages()) {
-   RTE_LOG(ERR, EAL,
-   "Started without hugepages support, physical addresses 
not available\n");
-   phys_addrs_available = false;
-   return;
-   }
-
-   physaddr = rte_mem_virt2phy(&tmp);
-   if (physaddr == RTE_BAD_PHYS_ADDR) {
-   if (rte_eal_iova_mode() == RTE_IOVA_PA)
-   RTE_LOG(ERR, EAL,
-   "Cannot obtain physical addresses: %s. "
-   "Only vfio will function.\n",
-   strerror(errno));
-   phys_addrs_available = false;
-   }
-}
-
 /*
  * Get physical address of any mapped virtual address in the current process.
  */
@@ -107,7 +81,7 @@ rte_mem_virt2phy(const void *virtaddr)
off_t offset;
 
/* Cannot parse /proc/self/pagemap, no need to log errors everywhere */
-   if (!phys_addrs_available)
+   if (!rte_eal_using_phys_addrs())
return RTE_BAD_IOVA;
 
/* standard page size */
@@ -1336,8 +1310,6 @@ eal_legacy_hugepage_init(void)
int nr_hugefiles, nr_hugepages = 0;
void *addr;
 
-   test_phys_addrs_available();
-
memset(used_hp, 0, sizeof(used_hp));
 
/* get pointer to global configuration */
@@ -1516,7 +1488,7 @@ eal_legacy_hugepage_init(void)
continue;
}
 
-   if (phys_addrs_available &&
+   if (rte_eal_using_phys_addrs() &&
rte_eal_iova_mode() != RTE_IOVA_VA) {
/* find physical addresses for each hugepage */
if (find_physaddrs(&tmp_hp[hp_offset], hpi) < 0) {
@@ -1735,8 +1707,6 @@ eal_hugepage_init(void)
uint64_t memory[RTE_MAX_NUMA_NODES];
int hp_sz_idx, socket_id;
 
-   test_phys_addrs_available();
-
memset(used_hp, 0, sizeof(used_hp));
 
for (hp_sz_idx = 0;
@@ -1879,8 +1849,6 @@ eal_legacy_hugepage_attach(void)
"into secondary processes\n");
}
 
-   test_phys_addrs_available();
-
fd_hugepage = open(eal_hugepage_data_path(), O_RDONLY);
if (fd_hugepage < 0) {
RTE_LOG(ERR, EAL, "Could not open %s\n",
@@ -2020,7 +1988,32 @@ rte_eal_hugepage_attach(void)
 int
 rte_eal_using_phys_addrs(void)
 {
-   return phys_addrs_available;
+   static int using_phys_addrs = -1;
+   uint64_t tmp = 0;
+   phys_addr_t physaddr;
+
+   if (using_phys_addrs != -1)
+   return using_phys_addrs;
+
+   /* Set the default to 1 */
+   using_phys_addrs = 1;
+
+   if (!rte_eal_has_hugepages()) {
+   RTE_LOG(ERR, EAL,
+   "Started without hugepages support, physical addresses 
not available\n");
+   using_phys_addrs = 0;
+   return using_phys_addrs;
+   }
+
+   physaddr = rte_mem_virt2phy(&tmp);
+   if (physaddr == RTE_BAD_PHYS_ADDR) {
+   if (rte_eal_iova_mode() == RTE_IOVA_PA)
+   RTE_LOG(ERR, EAL,
+   "Cannot obtain physical addresses. Only vfio 
will function.\n");
+   using_phys_addrs = 0;
+   }
+
+   return using_phys_addrs;
 }
 
 static int __rte_unused
-- 
2.20.1



[dpdk-dev] [PATCH v2 03/12] eal/pci: Rework loops in rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
Make all of the loops first iterate over devices, then
drivers. This is in preparation for combining them
into a single loop.

Signed-off-by: Ben Walker 
---
 drivers/bus/pci/linux/pci.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index d3177916a..70815e4f0 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -589,10 +589,10 @@ rte_pci_get_iommu_class(void)
if (!is_bound)
return RTE_IOVA_DC;
 
-   FOREACH_DRIVER_ON_PCIBUS(drv) {
-   if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   if (dev->kdrv == RTE_KDRV_VFIO &&
+   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   if (dev->kdrv == RTE_KDRV_VFIO) {
+   FOREACH_DRIVER_ON_PCIBUS(drv) {
+   if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA &&
rte_pci_match(drv, dev)) {
has_iova_va = true;
break;
@@ -631,8 +631,8 @@ rte_pci_get_iommu_class(void)
}
 
break_out = false;
-   FOREACH_DRIVER_ON_PCIBUS(drv) {
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   FOREACH_DRIVER_ON_PCIBUS(drv) {
if (!rte_pci_match(drv, dev))
continue;
/*
-- 
2.20.1



[dpdk-dev] [PATCH v2 04/12] eal/pci: Collapse two loops in rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
Two of these loops easily collapse into a single loop.
This sets the stage for future simplifications.

Signed-off-by: Ben Walker 
---
 drivers/bus/pci/linux/pci.c | 31 ++-
 1 file changed, 10 insertions(+), 21 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 70815e4f0..29ffae77f 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -571,7 +571,6 @@ rte_pci_get_iommu_class(void)
bool has_iova_va = false;
bool is_bound_uio = false;
bool iommu_no_va = false;
-   bool break_out;
bool need_check;
struct rte_pci_device *dev = NULL;
struct rte_pci_driver *drv = NULL;
@@ -592,8 +591,16 @@ rte_pci_get_iommu_class(void)
FOREACH_DEVICE_ON_PCIBUS(dev) {
if (dev->kdrv == RTE_KDRV_VFIO) {
FOREACH_DRIVER_ON_PCIBUS(drv) {
-   if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA &&
-   rte_pci_match(drv, dev)) {
+   if (!rte_pci_match(drv, dev))
+   continue;
+
+   /*
+* just one PCI device needs to be checked out 
because
+* the IOMMU hardware is the same for all of 
them.
+*/
+   iommu_no_va = 
!pci_one_device_iommu_support_va(dev);
+
+   if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
has_iova_va = true;
break;
}
@@ -630,24 +637,6 @@ rte_pci_get_iommu_class(void)
}
}
 
-   break_out = false;
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   FOREACH_DRIVER_ON_PCIBUS(drv) {
-   if (!rte_pci_match(drv, dev))
-   continue;
-   /*
-* just one PCI device needs to be checked out because
-* the IOMMU hardware is the same for all of them.
-*/
-   iommu_no_va = !pci_one_device_iommu_support_va(dev);
-   break_out = true;
-   break;
-   }
-
-   if (break_out)
-   break;
-   }
-
 #ifdef VFIO_PRESENT
is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ?
true : false;
-- 
2.20.1



[dpdk-dev] [PATCH v2 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
This is in preparation for future simplifications. The
functions are simply inlined for now.

Signed-off-by: Ben Walker 
---
 drivers/bus/pci/linux/pci.c | 176 +++-
 1 file changed, 71 insertions(+), 105 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index c99d523f0..d3177916a 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -497,86 +497,6 @@ rte_pci_scan(void)
return -1;
 }
 
-/*
- * Is pci device bound to any kdrv
- */
-static inline int
-pci_one_device_is_bound(void)
-{
-   struct rte_pci_device *dev = NULL;
-   int ret = 0;
-
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   if (dev->kdrv == RTE_KDRV_UNKNOWN ||
-   dev->kdrv == RTE_KDRV_NONE) {
-   continue;
-   } else {
-   ret = 1;
-   break;
-   }
-   }
-   return ret;
-}
-
-/*
- * Any one of the device bound to uio
- */
-static inline int
-pci_one_device_bound_uio(void)
-{
-   struct rte_pci_device *dev = NULL;
-   struct rte_devargs *devargs;
-   int need_check;
-
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   devargs = dev->device.devargs;
-
-   need_check = 0;
-   switch (rte_pci_bus.bus.conf.scan_mode) {
-   case RTE_BUS_SCAN_WHITELIST:
-   if (devargs && devargs->policy == RTE_DEV_WHITELISTED)
-   need_check = 1;
-   break;
-   case RTE_BUS_SCAN_UNDEFINED:
-   case RTE_BUS_SCAN_BLACKLIST:
-   if (devargs == NULL ||
-   devargs->policy != RTE_DEV_BLACKLISTED)
-   need_check = 1;
-   break;
-   }
-
-   if (!need_check)
-   continue;
-
-   if (dev->kdrv == RTE_KDRV_IGB_UIO ||
-  dev->kdrv == RTE_KDRV_UIO_GENERIC) {
-   return 1;
-   }
-   }
-   return 0;
-}
-
-/*
- * Any one of the device has iova as va
- */
-static inline int
-pci_one_device_has_iova_va(void)
-{
-   struct rte_pci_device *dev = NULL;
-   struct rte_pci_driver *drv = NULL;
-
-   FOREACH_DRIVER_ON_PCIBUS(drv) {
-   if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   if (dev->kdrv == RTE_KDRV_VFIO &&
-   rte_pci_match(drv, dev))
-   return 1;
-   }
-   }
-   }
-   return 0;
-}
-
 #if defined(RTE_ARCH_X86)
 static bool
 pci_one_device_iommu_support_va(struct rte_pci_device *dev)
@@ -641,14 +561,76 @@ pci_one_device_iommu_support_va(__rte_unused struct 
rte_pci_device *dev)
 #endif
 
 /*
- * All devices IOMMUs support VA as IOVA
+ * Get iommu class of PCI devices on the bus.
  */
-static bool
-pci_devices_iommu_support_va(void)
+enum rte_iova_mode
+rte_pci_get_iommu_class(void)
 {
+   bool is_bound = false;
+   bool is_vfio_noiommu_enabled = true;
+   bool has_iova_va = false;
+   bool is_bound_uio = false;
+   bool iommu_no_va = false;
+   bool break_out;
+   bool need_check;
struct rte_pci_device *dev = NULL;
struct rte_pci_driver *drv = NULL;
+   struct rte_devargs *devargs;
+
+   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   if (dev->kdrv == RTE_KDRV_UNKNOWN ||
+   dev->kdrv == RTE_KDRV_NONE) {
+   continue;
+   } else {
+   is_bound = true;
+   break;
+   }
+   }
+   if (!is_bound)
+   return RTE_IOVA_DC;
 
+   FOREACH_DRIVER_ON_PCIBUS(drv) {
+   if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
+   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   if (dev->kdrv == RTE_KDRV_VFIO &&
+   rte_pci_match(drv, dev)) {
+   has_iova_va = true;
+   break;
+   }
+   }
+
+   if (has_iova_va)
+   break;
+   }
+   }
+
+   FOREACH_DEVICE_ON_PCIBUS(dev) {
+   devargs = dev->device.devargs;
+
+   need_check = false;
+   switch (rte_pci_bus.bus.conf.scan_mode) {
+   case RTE_BUS_SCAN_WHITELIST:
+   if (devargs && devargs->policy == RTE_DEV_WHITELISTED)
+   need_check = true;
+   break;
+   case RTE_BUS_SCAN_UNDEFINED:
+   case RTE_BUS_SCAN_BLACKLIST:
+   if (devargs == NULL ||
+   devarg

[dpdk-dev] [PATCH v2 05/12] eal/pci: Add function pci_ignore_device

2019-05-30 Thread Ben Walker
This performs a check for whether the device should be ignored
due to whitelist or blacklist. This check eventually needs
to apply to all of the other checks in rte_pci_get_iommu_class.

Signed-off-by: Ben Walker 
---
 drivers/bus/pci/linux/pci.c | 44 +
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 29ffae77f..6d311f4e0 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -560,6 +560,29 @@ pci_one_device_iommu_support_va(__rte_unused struct 
rte_pci_device *dev)
 }
 #endif
 
+static bool
+pci_ignore_device(struct rte_pci_device *dev)
+{
+   struct rte_devargs *devargs;
+
+   devargs = dev->device.devargs;
+
+   switch (rte_pci_bus.bus.conf.scan_mode) {
+   case RTE_BUS_SCAN_WHITELIST:
+   if (devargs && devargs->policy == RTE_DEV_WHITELISTED)
+   return false;
+   break;
+   case RTE_BUS_SCAN_UNDEFINED:
+   case RTE_BUS_SCAN_BLACKLIST:
+   if (devargs == NULL ||
+   devargs->policy != RTE_DEV_BLACKLISTED)
+   return false;
+   break;
+   }
+
+   return true;
+}
+
 /*
  * Get iommu class of PCI devices on the bus.
  */
@@ -571,10 +594,9 @@ rte_pci_get_iommu_class(void)
bool has_iova_va = false;
bool is_bound_uio = false;
bool iommu_no_va = false;
-   bool need_check;
struct rte_pci_device *dev = NULL;
struct rte_pci_driver *drv = NULL;
-   struct rte_devargs *devargs;
+
 
FOREACH_DEVICE_ON_PCIBUS(dev) {
if (dev->kdrv == RTE_KDRV_UNKNOWN ||
@@ -612,23 +634,7 @@ rte_pci_get_iommu_class(void)
}
 
FOREACH_DEVICE_ON_PCIBUS(dev) {
-   devargs = dev->device.devargs;
-
-   need_check = false;
-   switch (rte_pci_bus.bus.conf.scan_mode) {
-   case RTE_BUS_SCAN_WHITELIST:
-   if (devargs && devargs->policy == RTE_DEV_WHITELISTED)
-   need_check = true;
-   break;
-   case RTE_BUS_SCAN_UNDEFINED:
-   case RTE_BUS_SCAN_BLACKLIST:
-   if (devargs == NULL ||
-   devargs->policy != RTE_DEV_BLACKLISTED)
-   need_check = true;
-   break;
-   }
-
-   if (!need_check)
+   if (pci_ignore_device(dev))
continue;
 
if (dev->kdrv == RTE_KDRV_IGB_UIO ||
-- 
2.20.1



[dpdk-dev] [PATCH v2 08/12] eal/pci: Collapse loops in rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
The three loops can now be easily combined into one.

This is slightly less efficient than before because it
doesn't break out early. But that can be addressed
later.

Signed-off-by: Ben Walker 
---
 drivers/bus/pci/linux/pci.c | 19 +++
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 549d61e74..765c473e8 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -604,15 +604,7 @@ rte_pci_get_iommu_class(void)
if (dev->kdrv != RTE_KDRV_UNKNOWN &&
dev->kdrv != RTE_KDRV_NONE) {
is_bound = true;
-   break;
}
-   }
-   if (!is_bound)
-   return RTE_IOVA_DC;
-
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   if (pci_ignore_device(dev))
-   continue;
 
if (dev->kdrv == RTE_KDRV_VFIO) {
FOREACH_DRIVER_ON_PCIBUS(drv) {
@@ -630,15 +622,7 @@ rte_pci_get_iommu_class(void)
break;
}
}
-
-   if (has_iova_va)
-   break;
}
-   }
-
-   FOREACH_DEVICE_ON_PCIBUS(dev) {
-   if (pci_ignore_device(dev))
-   continue;
 
if (dev->kdrv == RTE_KDRV_IGB_UIO ||
   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
@@ -646,6 +630,9 @@ rte_pci_get_iommu_class(void)
}
}
 
+   if (!is_bound)
+   return RTE_IOVA_DC;
+
 #ifdef VFIO_PRESENT
is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ?
true : false;
-- 
2.20.1



[dpdk-dev] [PATCH v2 07/12] eal/pci: Reverse if check in rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
It's simpler to reverse the if statement here, especially
with an upcoming simplification.

Signed-off-by: Ben Walker 
---
 drivers/bus/pci/linux/pci.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index d2464d2ae..549d61e74 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -601,10 +601,8 @@ rte_pci_get_iommu_class(void)
if (pci_ignore_device(dev))
continue;
 
-   if (dev->kdrv == RTE_KDRV_UNKNOWN ||
-   dev->kdrv == RTE_KDRV_NONE) {
-   continue;
-   } else {
+   if (dev->kdrv != RTE_KDRV_UNKNOWN &&
+   dev->kdrv != RTE_KDRV_NONE) {
is_bound = true;
break;
}
-- 
2.20.1



[dpdk-dev] [PATCH v2 09/12] eal/pci: Simplify rte_pci_get_iommu class by using a switch

2019-05-30 Thread Ben Walker
Take several independent if statements and convert to a
switch statement.

Signed-off-by: Ben Walker 
---
 drivers/bus/pci/linux/pci.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 765c473e8..5e61f46c8 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -601,12 +601,12 @@ rte_pci_get_iommu_class(void)
if (pci_ignore_device(dev))
continue;
 
-   if (dev->kdrv != RTE_KDRV_UNKNOWN &&
-   dev->kdrv != RTE_KDRV_NONE) {
+   switch (dev->kdrv) {
+   case RTE_KDRV_UNKNOWN:
+   case RTE_KDRV_NONE:
+   break;
+   case RTE_KDRV_VFIO:
is_bound = true;
-   }
-
-   if (dev->kdrv == RTE_KDRV_VFIO) {
FOREACH_DRIVER_ON_PCIBUS(drv) {
if (!rte_pci_match(drv, dev))
continue;
@@ -622,11 +622,14 @@ rte_pci_get_iommu_class(void)
break;
}
}
-   }
-
-   if (dev->kdrv == RTE_KDRV_IGB_UIO ||
-  dev->kdrv == RTE_KDRV_UIO_GENERIC) {
+   break;
+   case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_UIO_GENERIC:
+   case RTE_KDRV_NIC_UIO:
+   is_bound = true;
is_bound_uio = true;
+   break;
+
}
}
 
-- 
2.20.1



[dpdk-dev] [PATCH v2 06/12] eal/pci: Correctly test whitelist/blacklist in rte_pci_get_iommu_class

2019-05-30 Thread Ben Walker
All of the checks should respect the white and black lists.

Signed-off-by: Ben Walker 
---
 drivers/bus/pci/linux/pci.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 6d311f4e0..d2464d2ae 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -597,8 +597,10 @@ rte_pci_get_iommu_class(void)
struct rte_pci_device *dev = NULL;
struct rte_pci_driver *drv = NULL;
 
-
FOREACH_DEVICE_ON_PCIBUS(dev) {
+   if (pci_ignore_device(dev))
+   continue;
+
if (dev->kdrv == RTE_KDRV_UNKNOWN ||
dev->kdrv == RTE_KDRV_NONE) {
continue;
@@ -611,6 +613,9 @@ rte_pci_get_iommu_class(void)
return RTE_IOVA_DC;
 
FOREACH_DEVICE_ON_PCIBUS(dev) {
+   if (pci_ignore_device(dev))
+   continue;
+
if (dev->kdrv == RTE_KDRV_VFIO) {
FOREACH_DRIVER_ON_PCIBUS(drv) {
if (!rte_pci_match(drv, dev))
-- 
2.20.1



[dpdk-dev] [PATCH v2 10/12] eal/pci: Finding a device bound to UIO does not force PA

2019-05-30 Thread Ben Walker
If a device is found that is bound to the UIO driver,
only force IOVA_PA if there is a driver registered to use it.

Signed-off-by: Ben Walker 
---
 drivers/bus/pci/linux/pci.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 5e61f46c8..a71c66380 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -627,7 +627,13 @@ rte_pci_get_iommu_class(void)
case RTE_KDRV_UIO_GENERIC:
case RTE_KDRV_NIC_UIO:
is_bound = true;
-   is_bound_uio = true;
+   FOREACH_DRIVER_ON_PCIBUS(drv) {
+   if (!rte_pci_match(drv, dev))
+   continue;
+
+   is_bound_uio = true;
+   break;
+   }
break;
 
}
-- 
2.20.1



[dpdk-dev] [PATCH v2 11/12] eal/pci: rte_pci_get_iommu_class handles no drivers

2019-05-30 Thread Ben Walker
In the case where no drivers are registered with the system,
rte_pci_get_iommu_class should return RTE_IOVA_DC.

Signed-off-by: Ben Walker 
---
 drivers/bus/pci/linux/pci.c | 91 -
 1 file changed, 50 insertions(+), 41 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index a71c66380..60424932e 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -589,49 +589,80 @@ pci_ignore_device(struct rte_pci_device *dev)
 enum rte_iova_mode
 rte_pci_get_iommu_class(void)
 {
-   bool is_bound = false;
-   bool is_vfio_noiommu_enabled = true;
-   bool has_iova_va = false;
-   bool is_bound_uio = false;
-   bool iommu_no_va = false;
-   struct rte_pci_device *dev = NULL;
-   struct rte_pci_driver *drv = NULL;
+   struct rte_pci_device *dev;
+   struct rte_pci_driver *drv;
+   struct rte_pci_addr *addr;
+   enum rte_iova_mode iova_mode;
+
+   iova_mode = RTE_IOVA_DC;
 
FOREACH_DEVICE_ON_PCIBUS(dev) {
if (pci_ignore_device(dev))
continue;
 
+   addr = &dev->addr;
+
switch (dev->kdrv) {
case RTE_KDRV_UNKNOWN:
case RTE_KDRV_NONE:
break;
case RTE_KDRV_VFIO:
-   is_bound = true;
FOREACH_DRIVER_ON_PCIBUS(drv) {
if (!rte_pci_match(drv, dev))
continue;
 
-   /*
-* just one PCI device needs to be checked out 
because
-* the IOMMU hardware is the same for all of 
them.
-*/
-   iommu_no_va = 
!pci_one_device_iommu_support_va(dev);
+   if ((drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) 
== 0)
+   continue;
 
-   if (drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
-   has_iova_va = true;
-   break;
+   if (!pci_one_device_iommu_support_va(dev)) {
+   RTE_LOG(WARNING, EAL, "Device " 
PCI_PRI_FMT
+   " wanted IOVA as VA, but ",
+   addr->domain, addr->bus, 
addr->devid,
+   addr->function);
+   RTE_LOG(WARNING, EAL,
+   "IOMMU does not support it.\n");
+   iova_mode = RTE_IOVA_PA;
+   }
+#ifdef VFIO_PRESENT
+   else if (rte_vfio_noiommu_is_enabled()) {
+   RTE_LOG(WARNING, EAL, "Device " 
PCI_PRI_FMT
+   " wanted IOVA as VA, but ",
+   addr->domain, addr->bus, 
addr->devid,
+   addr->function);
+   RTE_LOG(WARNING, EAL,
+   "vfio-noiommu is enabled.\n");
+   iova_mode = RTE_IOVA_PA;
+#endif
+   } else if (iova_mode == RTE_IOVA_PA) {
+   RTE_LOG(WARNING, EAL, "Device " 
PCI_PRI_FMT
+   " wanted IOVA as VA, but ",
+   addr->domain, addr->bus, 
addr->devid,
+   addr->function);
+   RTE_LOG(WARNING, EAL,
+   "other devices require PA.\n");
+   } else {
+   iova_mode = RTE_IOVA_VA;
}
}
break;
case RTE_KDRV_IGB_UIO:
case RTE_KDRV_UIO_GENERIC:
case RTE_KDRV_NIC_UIO:
-   is_bound = true;
FOREACH_DRIVER_ON_PCIBUS(drv) {
if (!rte_pci_match(drv, dev))
continue;
 
-   is_bound_uio = true;
+   if (iova_mode == RTE_IOVA_VA) {
+   RTE_LOG(WARNING, EAL,
+   "Some devices wanted IOVA as 
VA, but ");
+   RTE_LOG(WARNING, EAL, "device " 
PCI_PRI_FMT
+   " requires PA.\n",
+   addr->domain

  1   2   >