Re: [dpdk-dev] [PATCH] ether: fix invalid string length in ethdev name comparison

2018-02-27 Thread Mohammad Abdul Awal



On 27/02/2018 00:15, Ananyev, Konstantin wrote:



-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Mohammad Abdul Awal

+   len1 = strlen(name);
for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
+   len2 = strlen(rte_eth_dev_shared_data->data[pid].name);
+   len = len1 > len2 ? len1 : len2;
if (rte_eth_devices[pid].state != RTE_ETH_DEV_UNUSED &&
!strncmp(name, rte_eth_dev_shared_data->data[pid].name,
-strlen(name))) {
+len)) {

Why just not simply use strcmp()? :)

That is the best I would say. I will submit a V2.




*port_id = pid;
return 0;
}
--
2.7.4




[dpdk-dev] [PATCH v2] ether: fix invalid string length in ethdev name comparison

2018-02-27 Thread Mohammad Abdul Awal
The current code compares two strings upto the length of 1st string
(searched name). If the 1st string is prefix of 2nd string (existing name),
the string comparison returns the port_id of earliest prefix matches.
This patch fixes the bug by using strcmp instead of strncmp.

Fixes: 9c5b8d8b9fe ("ethdev: clean port id retrieval when attaching")

Signed-off-by: Mohammad Abdul Awal 
---
 lib/librte_ether/rte_ethdev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 0590f0c..3b885a6 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -572,8 +572,7 @@ rte_eth_dev_get_port_by_name(const char *name, uint16_t 
*port_id)
 
for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
if (rte_eth_devices[pid].state != RTE_ETH_DEV_UNUSED &&
-   !strncmp(name, rte_eth_dev_shared_data->data[pid].name,
-strlen(name))) {
+   !strcmp(name, rte_eth_dev_shared_data->data[pid].name)) {
*port_id = pid;
return 0;
}
-- 
2.7.4



Re: [dpdk-dev] Back-up committers for subtrees

2018-02-27 Thread De Lara Guarch, Pablo
Hi,

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of De Lara Guarch,
> Pablo
> Sent: Wednesday, February 21, 2018 11:05 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] Back-up committers for subtrees
> 
> Hi everyone,
> 
> In the last few releases, the DPDK community has experienced a significant
> growth, specifically in patches submitted and integrated.
> Therefore, the number of subtrees has increased, to help maintain the
> scalability.
> 
> Section 5.3 of the Contributor's guide
> (http://dpdk.org/doc/guides/contributing/patches.html), Maintainers and
> Sub-trees, states that there should be a backup maintainer per subtree:
> 
> "Ensure that there is a designated back-up maintainer and coordinate a
> handover for periods where the tree maintainer can't perform their roles."
> 
> However, this is not the case for some of the subtrees. This could lead to
> patch integration delays in case of unavailability (short or long term) of the
> subtree committer, especially on busy times, which could lead to a delay in
> an RC release.
> 
> My suggestion is that every primary subtree committer proposes a back-up
> committer for their subtree.
> Once the proposed person agrees on taking this role, they should follow the
> procedure explained in the documentation:
> 
> 
> "Tree maintainers can be added or removed by submitting a patch to the
> MAINTAINERS file.
> 
> The proposer should justify the need for a new sub-tree and should have
> demonstrated a sufficient level of contributions in the area or to a similar
> area.
> 
> The maintainer should be confirmed by an ack from an existing tree
> maintainer. Disagreements on trees or maintainers can be brought to the
> Technical Board.
> 
> 
> 
> The backup maintainer for the master tree should be selected from the
> existing sub-tree maintainers from the project.
> 
> The backup maintainer for a sub-tree should be selected from among the
> component maintainers within that sub-tree."
> 
> The patches will be merged mainly by the primary committer, except when
> this is unavailable, in which case, the back-up committer will take over,
> after this unavailability is communicated between the two committers.
> This implies that there will be no co-maintainership, to maintain a single
> point of contact with the mainline tree maintainer.
> 
> Any objections?

CC'ing Tech board. Could this be discussed in the next meeting?

Thanks,
Pablo

> 
> Thanks,
> Pablo
> 



[dpdk-dev] [RFC 1/7] net/af_xdp: new PMD driver

2018-02-27 Thread Qi Zhang
This is the vanilla version.
Packet data will copy between af_xdp memory buffer and mbuf mempool.
indexes of memory buffer is simply managed by a fifo ring.

Signed-off-by: Qi Zhang 
---
 config/common_base|   5 +
 config/common_linuxapp|   1 +
 drivers/net/Makefile  |   1 +
 drivers/net/af_xdp/Makefile   |  56 ++
 drivers/net/af_xdp/meson.build|   7 +
 drivers/net/af_xdp/rte_eth_af_xdp.c   | 763 ++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
 drivers/net/af_xdp/xdpsock_queue.h|  62 +++
 mk/rte.app.mk |   1 +
 9 files changed, 900 insertions(+)
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
 create mode 100644 drivers/net/af_xdp/xdpsock_queue.h

diff --git a/config/common_base b/config/common_base
index ad03cf433..84b7b3b7e 100644
--- a/config/common_base
+++ b/config/common_base
@@ -368,6 +368,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
 #
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
+#
 # Compile link bonding PMD library
 #
 CONFIG_RTE_LIBRTE_PMD_BOND=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index ff98f2355..3b10695b6 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -16,6 +16,7 @@ CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_VHOST_NUMA=y
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
 CONFIG_RTE_LIBRTE_PMD_TAP=y
 CONFIG_RTE_LIBRTE_AVP_PMD=y
 CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index e1127326b..409234ac3 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_AVF_PMD) += avf
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 0..ac38e20bf
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,56 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2014 John W. Linville 
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   Copyright(c) 2014 6WIND S.A.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3 -I/opt/af_xdp/linux_headers/include
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 0..4b5299c8e
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 I

[dpdk-dev] [RFC 0/7] PMD driver for AF_XDP

2018-02-27 Thread Qi Zhang
The RFC patches add a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below link for 
detail AF_XDP introduction:
https://fosdem.org/2018/schedule/event/af_xdp/
https://lwn.net/Articles/745934/

This patchset is base on v18.02.
It also require a linux kernel that have below AF_XDP RFC patches be
applied.
https://patchwork.ozlabs.org/patch/867961/
https://patchwork.ozlabs.org/patch/867960/
https://patchwork.ozlabs.org/patch/867938/
https://patchwork.ozlabs.org/patch/867939/
https://patchwork.ozlabs.org/patch/867940/
https://patchwork.ozlabs.org/patch/867941/
https://patchwork.ozlabs.org/patch/867942/
https://patchwork.ozlabs.org/patch/867943/
https://patchwork.ozlabs.org/patch/867944/
https://patchwork.ozlabs.org/patch/867945/
https://patchwork.ozlabs.org/patch/867946/
https://patchwork.ozlabs.org/patch/867947/
https://patchwork.ozlabs.org/patch/867948/
https://patchwork.ozlabs.org/patch/867949/
https://patchwork.ozlabs.org/patch/867950/
https://patchwork.ozlabs.org/patch/867951/
https://patchwork.ozlabs.org/patch/867952/
https://patchwork.ozlabs.org/patch/867953/
https://patchwork.ozlabs.org/patch/867954/
https://patchwork.ozlabs.org/patch/867955/
https://patchwork.ozlabs.org/patch/867956/
https://patchwork.ozlabs.org/patch/867957/
https://patchwork.ozlabs.org/patch/867958/
https://patchwork.ozlabs.org/patch/867959/

There is no clean upstream target yet since kernel patch is still in
RFC stage, The purpose of the patchset is just for anyone that want to
eveluate af_xdp with DPDK application and get feedback for further
improvement.

To try with the new PMD
1. compile and install the kernel with above patches applied.
2. configure $LINUX_HEADER_DIR (dir of "make headers_install")
   and $TOOLS_DIR (dir at /tools) at driver/net/af_xdp/Makefile
   before compile DPDK.
3. make sure libelf and libbpf is installed.

BTW, performance test shows our PMD can reach 94%~98% of the orignal benchmark
when share memory is enabled.

Qi Zhang (7):
  net/af_xdp: new PMD driver
  lib/mbuf: enable parse flags when create mempool
  lib/mempool: allow page size aligned mempool
  net/af_xdp: use mbuf mempool for buffer management
  net/af_xdp: enable share mempool
  net/af_xdp: load BPF file
  app/testpmd: enable parameter for mempool flags

 app/test-pmd/parameters.c |  12 +
 app/test-pmd/testpmd.c|  15 +-
 app/test-pmd/testpmd.h|   1 +
 config/common_base|   5 +
 config/common_linuxapp|   1 +
 drivers/net/Makefile  |   1 +
 drivers/net/af_xdp/Makefile   |  60 ++
 drivers/net/af_xdp/bpf_load.c | 798 +++
 drivers/net/af_xdp/bpf_load.h |  65 ++
 drivers/net/af_xdp/libbpf.h   | 199 ++
 drivers/net/af_xdp/meson.build|   7 +
 drivers/net/af_xdp/rte_eth_af_xdp.c   | 878 ++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
 drivers/net/af_xdp/xdpsock_queue.h|  62 ++
 lib/librte_mbuf/rte_mbuf.c|  15 +-
 lib/librte_mbuf/rte_mbuf.h|   8 +-
 lib/librte_mempool/rte_mempool.c  |   2 +
 lib/librte_mempool/rte_mempool.h  |   1 +
 mk/rte.app.mk |   1 +
 19 files changed, 2125 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/bpf_load.c
 create mode 100644 drivers/net/af_xdp/bpf_load.h
 create mode 100644 drivers/net/af_xdp/libbpf.h
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
 create mode 100644 drivers/net/af_xdp/xdpsock_queue.h

-- 
2.13.6



[dpdk-dev] [RFC 3/7] lib/mempool: allow page size aligned mempool

2018-02-27 Thread Qi Zhang
Allow create a mempool with page size aligned base address.

Signed-off-by: Qi Zhang 
---
 lib/librte_mempool/rte_mempool.c | 2 ++
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 54f7f4ba4..f8d4814ad 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -567,6 +567,8 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_shift = 0; /* not needed, zone is physically contiguous */
pg_sz = 0;
align = RTE_CACHE_LINE_SIZE;
+   if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
+   align = getpagesize();
} else {
pg_sz = getpagesize();
pg_shift = rte_bsf32(pg_sz);
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 8b1b7f7ed..774ab0f66 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -245,6 +245,7 @@ struct rte_mempool {
 #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
"single-consumer".*/
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous 
objs. */
+#define MEMPOOL_F_PAGE_ALIGN 0x0040 /**< Base address is page aligned. */
 /**
  * This capability flag is advertised by a mempool handler, if the whole
  * memory area containing the objects must be physically contiguous.
-- 
2.13.6



[dpdk-dev] [RFC 2/7] lib/mbuf: enable parse flags when create mempool

2018-02-27 Thread Qi Zhang
This give the option that applicaiton can configure each
memory chunk's size precisely. (by MEMPOOL_F_NO_SPREAD).

Signed-off-by: Qi Zhang 
---
 lib/librte_mbuf/rte_mbuf.c | 15 ---
 lib/librte_mbuf/rte_mbuf.h |  8 +++-
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 091d388d3..5fd91c87c 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -125,7 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
 struct rte_mempool * __rte_experimental
 rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-   int socket_id, const char *ops_name)
+   unsigned int flags, int socket_id, const char *ops_name)
 {
struct rte_mempool *mp;
struct rte_pktmbuf_pool_private mbp_priv;
@@ -145,7 +145,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned 
int n,
mbp_priv.mbuf_priv_size = priv_size;
 
mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
-sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+sizeof(struct rte_pktmbuf_pool_private), socket_id, flags);
if (mp == NULL)
return NULL;
 
@@ -179,9 +179,18 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n,
int socket_id)
 {
return rte_pktmbuf_pool_create_by_ops(name, n, cache_size, priv_size,
-   data_room_size, socket_id, NULL);
+   data_room_size, 0, socket_id, NULL);
 }
 
+/* helper to create a mbuf pool with NO_SPREAD */
+struct rte_mempool *
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+   unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+   unsigned int flags, int socket_id)
+{
+   return rte_pktmbuf_pool_create_by_ops(name, n, cache_size, priv_size,
+   data_room_size, flags, socket_id, NULL);
+}
 /* do some sanity checks on a mbuf: panic if it fails */
 void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 62740254d..6f6af42a8 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1079,6 +1079,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
int socket_id);
 
+struct rte_mempool *
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+   unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
+   unsigned flags, int socket_id);
+
+
 /**
  * Create a mbuf pool with a given mempool ops name
  *
@@ -1119,7 +1125,7 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 struct rte_mempool * __rte_experimental
 rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-   int socket_id, const char *ops_name);
+   unsigned int flags, int socket_id, const char *ops_name);
 
 /**
  * Get the data room size of mbufs stored in a pktmbuf_pool
-- 
2.13.6



[dpdk-dev] [RFC 5/7] net/af_xdp: enable share mempool

2018-02-27 Thread Qi Zhang
Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Qi Zhang 
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 191 +++-
 1 file changed, 125 insertions(+), 66 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c 
b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 3c534c77c..d0939022b 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -60,7 +60,6 @@ struct xdp_umem {
unsigned int frame_size;
unsigned int frame_size_log2;
unsigned int nframes;
-   int mr_fd;
struct rte_mempool *mb_pool;
 };
 
@@ -73,6 +72,7 @@ struct pmd_internals {
struct xdp_queue tx;
struct xdp_umem *umem;
struct rte_mempool *ext_mb_pool;
+   uint8_t share_mb_pool;
 
unsigned long rx_pkts;
unsigned long rx_bytes;
@@ -162,20 +162,30 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
char *pkt;
uint32_t idx = descs[i].idx;
 
-   mbuf = rte_pktmbuf_alloc(internals->ext_mb_pool);
-   rte_pktmbuf_pkt_len(mbuf) =
-   rte_pktmbuf_data_len(mbuf) =
-   descs[i].len;
-   if (mbuf) {
-   pkt = get_pkt_data(internals, idx, descs[i].offset);
-   memcpy(rte_pktmbuf_mtod(mbuf, void *),
-  pkt, descs[i].len);
-   rx_bytes += descs[i].len;
-   bufs[count++] = mbuf;
+   if (!internals->share_mb_pool) {
+   mbuf = rte_pktmbuf_alloc(internals->ext_mb_pool);
+   rte_pktmbuf_pkt_len(mbuf) =
+   rte_pktmbuf_data_len(mbuf) =
+   descs[i].len;
+   if (mbuf) {
+   pkt = get_pkt_data(internals, idx,
+  descs[i].offset);
+   memcpy(rte_pktmbuf_mtod(mbuf, void *), pkt,
+  descs[i].len);
+   rx_bytes += descs[i].len;
+   bufs[count++] = mbuf;
+   } else {
+   dropped++;
+   }
+   rte_pktmbuf_free(idx_to_mbuf(internals, idx));
} else {
-   dropped++;
+   mbuf = idx_to_mbuf(internals, idx);
+   rte_pktmbuf_pkt_len(mbuf) =
+   rte_pktmbuf_data_len(mbuf) =
+   descs[i].len;
+   bufs[count++] = mbuf;
+   rx_bytes += descs[i].len;
}
-   rte_pktmbuf_free(idx_to_mbuf(internals, idx));
}
 
internals->rx_pkts += (rcvd - dropped);
@@ -209,51 +219,71 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
uint16_t i, valid;
unsigned long tx_bytes = 0;
int ret;
+   uint8_t share_mempool = 0;
 
nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ?
  nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE;
 
if (txq->num_free < ETH_AF_XDP_TX_BATCH_SIZE * 2) {
int n = xq_deq(txq, descs, ETH_AF_XDP_TX_BATCH_SIZE);
-
for (i = 0; i < n; i++)
rte_pktmbuf_free(idx_to_mbuf(internals, descs[i].idx));
}
 
nb_pkts = nb_pkts > txq->num_free ? txq->num_free : nb_pkts;
-   ret = rte_mempool_get_bulk(internals->umem->mb_pool,
-  (void *)mbufs,
-  nb_pkts);
-   if (ret)
+   if (nb_pkts == 0)
return 0;
 
+   if (bufs[0]->pool == internals->ext_mb_pool && internals->share_mb_pool)
+   share_mempool = 1;
+
+   if (!share_mempool) {
+   ret = rte_mempool_get_bulk(internals->umem->mb_pool,
+  (void *)mbufs,
+  nb_pkts);
+   if (ret)
+   return 0;
+   }
+
valid = 0;
for (i = 0; i < nb_pkts; i++) {
char *pkt;
-   unsigned int buf_len =
-   internals->umem->frame_size - ETH_AF_XDP_DATA_HEADROOM;
mbuf = bufs[i];
-   if (mbuf->pkt_len <= buf_len) {
-   descs[valid].idx = mbuf_to_idx(internals, mbufs[i]);
-   descs[valid].offset = ETH_AF_XDP_DATA_HEADROOM;
-   descs[valid].flags = 0;
-   descs[valid].len = mbuf->pkt_len;
-   pkt = get_pkt_data(internals, descs[i].idx,
-  descs[i].off

[dpdk-dev] [RFC 4/7] net/af_xdp: use mbuf mempool for buffer management

2018-02-27 Thread Qi Zhang
Now, af_xdp registered memory buffer is managed by rte_mempool.
mbuf be allocated from rte_mempool can be convert to descriptor
index and vice versa.

Signed-off-by: Qi Zhang 
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 165 +---
 1 file changed, 97 insertions(+), 68 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c 
b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 4eb8a2c28..3c534c77c 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -43,7 +43,11 @@
 
 #define ETH_AF_XDP_FRAME_SIZE  2048
 #define ETH_AF_XDP_NUM_BUFFERS 131072
-#define ETH_AF_XDP_DATA_HEADROOM   0
+/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
+#define ETH_AF_XDP_MBUF_OVERHEAD   192
+/* data start from offset 320 (192 + 128) bytes */
+#define ETH_AF_XDP_DATA_HEADROOM \
+   (ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
 #define ETH_AF_XDP_DFLT_RING_SIZE  1024
 #define ETH_AF_XDP_DFLT_QUEUE_IDX  0
 
@@ -57,6 +61,7 @@ struct xdp_umem {
unsigned int frame_size_log2;
unsigned int nframes;
int mr_fd;
+   struct rte_mempool *mb_pool;
 };
 
 struct pmd_internals {
@@ -67,7 +72,7 @@ struct pmd_internals {
struct xdp_queue rx;
struct xdp_queue tx;
struct xdp_umem *umem;
-   struct rte_mempool *mb_pool;
+   struct rte_mempool *ext_mb_pool;
 
unsigned long rx_pkts;
unsigned long rx_bytes;
@@ -80,7 +85,6 @@ struct pmd_internals {
uint16_t port_id;
uint16_t queue_idx;
int ring_size;
-   struct rte_ring *buf_ring;
 };
 
 static const char * const valid_arguments[] = {
@@ -106,6 +110,21 @@ static void *get_pkt_data(struct pmd_internals *internals,
   offset);
 }
 
+static uint32_t
+mbuf_to_idx(struct pmd_internals *internals, struct rte_mbuf *mbuf)
+{
+   return (uint32_t)(((uint64_t)mbuf->buf_addr -
+  (uint64_t)internals->umem->buffer) >>
+ internals->umem->frame_size_log2);
+}
+
+static struct rte_mbuf *
+idx_to_mbuf(struct pmd_internals *internals, uint32_t idx)
+{
+   return (struct rte_mbuf *)(void *)(internals->umem->buffer + (idx
+   << internals->umem->frame_size_log2) + 0x40);
+}
+
 static uint16_t
 eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 {
@@ -120,17 +139,18 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
  nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE;
 
struct xdp_desc descs[ETH_AF_XDP_RX_BATCH_SIZE];
-   void *indexes[ETH_AF_XDP_RX_BATCH_SIZE];
+   struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE];
int rcvd, i;
/* fill rx ring */
if (rxq->num_free >= ETH_AF_XDP_RX_BATCH_SIZE) {
-   int n = rte_ring_dequeue_bulk(internals->buf_ring,
- indexes,
- ETH_AF_XDP_RX_BATCH_SIZE,
- NULL);
-   for (i = 0; i < n; i++)
-   descs[i].idx = (uint32_t)((long int)indexes[i]);
-   xq_enq(rxq, descs, n);
+   int ret = rte_mempool_get_bulk(internals->umem->mb_pool,
+(void *)mbufs,
+ETH_AF_XDP_RX_BATCH_SIZE);
+   if (!ret) {
+   for (i = 0; i < ETH_AF_XDP_RX_BATCH_SIZE; i++)
+   descs[i].idx = mbuf_to_idx(internals, mbufs[i]);
+   xq_enq(rxq, descs, ETH_AF_XDP_RX_BATCH_SIZE);
+   }
}
 
/* read data */
@@ -142,7 +162,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t 
nb_pkts)
char *pkt;
uint32_t idx = descs[i].idx;
 
-   mbuf = rte_pktmbuf_alloc(internals->mb_pool);
+   mbuf = rte_pktmbuf_alloc(internals->ext_mb_pool);
rte_pktmbuf_pkt_len(mbuf) =
rte_pktmbuf_data_len(mbuf) =
descs[i].len;
@@ -155,11 +175,9 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
} else {
dropped++;
}
-   indexes[i] = (void *)((long int)idx);
+   rte_pktmbuf_free(idx_to_mbuf(internals, idx));
}
 
-   rte_ring_enqueue_bulk(internals->buf_ring, indexes, rcvd, NULL);
-
internals->rx_pkts += (rcvd - dropped);
internals->rx_bytes += rx_bytes;
internals->rx_dropped += dropped;
@@ -187,9 +205,10 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
struct xdp_queue *txq = &internals->tx;
struct rte_mbuf *mbuf;
struct xdp_desc descs[ETH_AF_XDP_TX_BATCH_SIZE];
-   void *indexes[ETH_AF_XDP_TX_BATCH_SIZE];
+   struct rte_mbuf *mbufs[ETH_AF_XDP_TX_

[dpdk-dev] [RFC 6/7] net/af_xdp: load BPF file

2018-02-27 Thread Qi Zhang
Add libbpf and libelf dependency in Makefile.
Durring initialization, bpf file "xdpsock_kern.o" will be loaded.
Then the driver will always try to link XDP fd with DRV mode first,
then SKB mode if failed in previoius.
Link will be released during dev_close.

Note: this is workaround solution, af_xdp may remove BPF dependency
in future.

Signed-off-by: Qi Zhang 
---
 drivers/net/af_xdp/Makefile |   6 +-
 drivers/net/af_xdp/bpf_load.c   | 798 
 drivers/net/af_xdp/bpf_load.h   |  65 +++
 drivers/net/af_xdp/libbpf.h | 199 +
 drivers/net/af_xdp/rte_eth_af_xdp.c |  31 +-
 mk/rte.app.mk   |   2 +-
 6 files changed, 1097 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/af_xdp/bpf_load.c
 create mode 100644 drivers/net/af_xdp/bpf_load.h
 create mode 100644 drivers/net/af_xdp/libbpf.h

diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
index ac38e20bf..a642786de 100644
--- a/drivers/net/af_xdp/Makefile
+++ b/drivers/net/af_xdp/Makefile
@@ -42,7 +42,10 @@ EXPORT_MAP := rte_pmd_af_xdp_version.map
 
 LIBABIVER := 1
 
-CFLAGS += -O3 -I/opt/af_xdp/linux_headers/include
+LINUX_HEADER_DIR := /opt/af_xdp/linux_headers/include
+TOOLS_DIR := /root/af_xdp/npg_dna-dna-linux/tools
+
+CFLAGS += -O3 -I$(LINUX_HEADER_DIR) -I$(TOOLS_DIR)/perf -I$(TOOLS_DIR)/include 
-Wno-error=sign-compare -Wno-error=cast-qual
 CFLAGS += $(WERROR_FLAGS)
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
@@ -52,5 +55,6 @@ LDLIBS += -lrte_bus_vdev
 # all source are stored in SRCS-y
 #
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += bpf_load.c
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/bpf_load.c b/drivers/net/af_xdp/bpf_load.c
new file mode 100644
index 0..aa632207f
--- /dev/null
+++ b/drivers/net/af_xdp/bpf_load.c
@@ -0,0 +1,798 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "libbpf.h"
+#include "bpf_load.h"
+#include "perf-sys.h"
+
+#define DEBUGFS "/sys/kernel/debug/tracing/"
+
+static char license[128];
+static int kern_version;
+static bool processed_sec[128];
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+int map_fd[MAX_MAPS];
+int prog_fd[MAX_PROGS];
+int event_fd[MAX_PROGS];
+int prog_cnt;
+int prog_array_fd = -1;
+
+struct bpf_map_data map_data[MAX_MAPS];
+int map_data_count = 0;
+
+static int populate_prog_array(const char *event, int prog_fd)
+{
+   int ind = atoi(event), err;
+
+   err = bpf_map_update_elem(prog_array_fd, &ind, &prog_fd, BPF_ANY);
+   if (err < 0) {
+   printf("failed to store prog_fd in prog_array\n");
+   return -1;
+   }
+   return 0;
+}
+
+static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
+{
+   bool is_socket = strncmp(event, "socket", 6) == 0;
+   bool is_kprobe = strncmp(event, "kprobe/", 7) == 0;
+   bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0;
+   bool is_tracepoint = strncmp(event, "tracepoint/", 11) == 0;
+   bool is_xdp = strncmp(event, "xdp", 3) == 0;
+   bool is_perf_event = strncmp(event, "perf_event", 10) == 0;
+   bool is_cgroup_skb = strncmp(event, "cgroup/skb", 10) == 0;
+   bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
+   bool is_sockops = strncmp(event, "sockops", 7) == 0;
+   bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
+   size_t insns_cnt = size / sizeof(struct bpf_insn);
+   enum bpf_prog_type prog_type;
+   char buf[256];
+   int fd, efd, err, id;
+   struct perf_event_attr attr = {};
+
+   attr.type = PERF_TYPE_TRACEPOINT;
+   attr.sample_type = PERF_SAMPLE_RAW;
+   attr.sample_period = 1;
+   attr.wakeup_events = 1;
+
+   if (is_socket) {
+   prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
+   } else if (is_kprobe || is_kretprobe) {
+   prog_type = BPF_PROG_TYPE_KPROBE;
+   } else if (is_tracepoint) {
+   prog_type = BPF_PROG_TYPE_TRACEPOINT;
+   } else if (is_xdp) {
+   prog_type = BPF_PROG_TYPE_XDP;
+   } else if (is_perf_event) {
+   prog_type = BPF_PROG_TYPE_PERF_EVENT;
+   } else if (is_cgroup_skb) {
+   prog_type = BPF_PROG_TYPE_CGROUP_SKB;
+   } else if (is_cgroup_sk) {
+   prog_type = BPF_PROG_TYPE_CGROUP_SOCK;
+   } else if (is_sockops) {
+   prog_type = BPF_PROG_TYPE_SOCK_OPS;
+   } else if (is_sk_skb) {
+   prog_type = BPF_PROG_TYPE_SK_SKB;
+   } else {
+   printf("Unknown event '%s'\n", event);
+   return -1;
+   }
+
+  

[dpdk-dev] [RFC 7/7] app/testpmd: enable parameter for mempool flags

2018-02-27 Thread Qi Zhang
Now, it is possible for testpmd to create a af_xdp friendly mempool.

Signed-off-by: Qi Zhang 
---
 app/test-pmd/parameters.c | 12 
 app/test-pmd/testpmd.c| 15 +--
 app/test-pmd/testpmd.h|  1 +
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 97d22b860..19675671e 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -61,6 +61,7 @@ usage(char* progname)
   "--tx-first | --stats-period=PERIOD | "
   "--coremask=COREMASK --portmask=PORTMASK --numa "
   "--mbuf-size= | --total-num-mbufs= | "
+  "--mp-flags= | "
   "--nb-cores= | --nb-ports= | "
 #ifdef RTE_LIBRTE_CMDLINE
   "--eth-peers-configfile= | "
@@ -105,6 +106,7 @@ usage(char* progname)
printf("  --socket-num=N: set socket from which all memory is allocated 
"
   "in NUMA mode.\n");
printf("  --mbuf-size=N: set the data size of mbuf to N bytes.\n");
+   printf("  --mp-flags=N: set the flags when create mbuf memory pool.\n");
printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
   "in mbuf pools.\n");
printf("  --max-pkt-len=N: set the maximum size of packet to N 
bytes.\n");
@@ -568,6 +570,7 @@ launch_args_parse(int argc, char** argv)
{ "ring-numa-config",   1, 0, 0 },
{ "socket-num", 1, 0, 0 },
{ "mbuf-size",  1, 0, 0 },
+   { "mp-flags",   1, 0, 0 },
{ "total-num-mbufs",1, 0, 0 },
{ "max-pkt-len",1, 0, 0 },
{ "pkt-filter-mode",1, 0, 0 },
@@ -769,6 +772,15 @@ launch_args_parse(int argc, char** argv)
rte_exit(EXIT_FAILURE,
 "mbuf-size should be > 0 and < 
65536\n");
}
+   if (!strcmp(lgopts[opt_idx].name, "mp-flags")) {
+   n = atoi(optarg);
+   if (n > 0 && n <= 0x)
+   mp_flags = (uint16_t)n;
+   else
+   rte_exit(EXIT_FAILURE,
+"mp-flags should be > 0 and < 
65536\n");
+   }
+
if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) {
n = atoi(optarg);
if (n > 1024)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e2586c..887899919 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -171,6 +171,7 @@ uint32_t burst_tx_delay_time = BURST_TX_WAIT_US;
 uint32_t burst_tx_retry_num = BURST_TX_RETRIES;
 
 uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */
+uint16_t mp_flags = 0; /**< flags parsed when create mempool */
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
   * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -486,6 +487,7 @@ set_def_fwd_config(void)
  */
 static void
 mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
+unsigned int flags,
 unsigned int socket_id)
 {
char pool_name[RTE_MEMPOOL_NAMESIZE];
@@ -503,7 +505,7 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
mb_size, (unsigned) mb_mempool_cache,
sizeof(struct rte_pktmbuf_pool_private),
-   socket_id, 0);
+   socket_id, flags);
if (rte_mp == NULL)
goto err;
 
@@ -518,8 +520,8 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
/* wrapper to rte_mempool_create() */
TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
rte_mbuf_best_mempool_ops());
-   rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-   mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+   rte_mp = rte_pktmbuf_pool_create_with_flags(pool_name, nb_mbuf,
+   mb_mempool_cache, 0, mbuf_seg_size, flags, socket_id);
}
 
 err:
@@ -735,13 +737,14 @@ init_config(void)
 
for (i = 0; i < num_sockets; i++)
mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool,
-socket_ids[i]);
+mp_flags, socket_ids[i]);
} else {
if (socket_num == UMA_NO_CONFIG)
-   mbuf_pool_create(mbuf_d

[dpdk-dev] [RFC 0/7] PMD driver for AF_XDP

2018-02-27 Thread Qi Zhang
The RFC patches add a new PMD driver for AF_XDP which is a proposed
faster version of AF_PACKET interface in Linux, see below link for 
detail AF_XDP introduction:
https://fosdem.org/2018/schedule/event/af_xdp/
https://lwn.net/Articles/745934/

This patchset is base on v18.02.
It also require a linux kernel that have below AF_XDP RFC patches be
applied.
https://patchwork.ozlabs.org/patch/867961/
https://patchwork.ozlabs.org/patch/867960/
https://patchwork.ozlabs.org/patch/867938/
https://patchwork.ozlabs.org/patch/867939/
https://patchwork.ozlabs.org/patch/867940/
https://patchwork.ozlabs.org/patch/867941/
https://patchwork.ozlabs.org/patch/867942/
https://patchwork.ozlabs.org/patch/867943/
https://patchwork.ozlabs.org/patch/867944/
https://patchwork.ozlabs.org/patch/867945/
https://patchwork.ozlabs.org/patch/867946/
https://patchwork.ozlabs.org/patch/867947/
https://patchwork.ozlabs.org/patch/867948/
https://patchwork.ozlabs.org/patch/867949/
https://patchwork.ozlabs.org/patch/867950/
https://patchwork.ozlabs.org/patch/867951/
https://patchwork.ozlabs.org/patch/867952/
https://patchwork.ozlabs.org/patch/867953/
https://patchwork.ozlabs.org/patch/867954/
https://patchwork.ozlabs.org/patch/867955/
https://patchwork.ozlabs.org/patch/867956/
https://patchwork.ozlabs.org/patch/867957/
https://patchwork.ozlabs.org/patch/867958/
https://patchwork.ozlabs.org/patch/867959/

There is no clean upstream target yet since kernel patch is still in
RFC stage, The purpose of the patchset is just for anyone that want to
eveluate af_xdp with DPDK application and get feedback for further
improvement.

To try with the new PMD
1. compile and install the kernel with above patches applied.
2. configure $LINUX_HEADER_DIR (dir of "make headers_install")
   and $TOOLS_DIR (dir at /tools) at driver/net/af_xdp/Makefile
   before compile DPDK.
3. make sure libelf and libbpf is installed.

BTW, performance test shows our PMD can reach 94%~98% of the orignal benchmark
when share memory is enabled.

Qi Zhang (7):
  net/af_xdp: new PMD driver
  lib/mbuf: enable parse flags when create mempool
  lib/mempool: allow page size aligned mempool
  net/af_xdp: use mbuf mempool for buffer management
  net/af_xdp: enable share mempool
  net/af_xdp: load BPF file
  app/testpmd: enable parameter for mempool flags

 app/test-pmd/parameters.c |  12 +
 app/test-pmd/testpmd.c|  15 +-
 app/test-pmd/testpmd.h|   1 +
 config/common_base|   5 +
 config/common_linuxapp|   1 +
 drivers/net/Makefile  |   1 +
 drivers/net/af_xdp/Makefile   |  60 ++
 drivers/net/af_xdp/bpf_load.c | 798 +++
 drivers/net/af_xdp/bpf_load.h |  65 ++
 drivers/net/af_xdp/libbpf.h   | 199 ++
 drivers/net/af_xdp/meson.build|   7 +
 drivers/net/af_xdp/rte_eth_af_xdp.c   | 878 ++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
 drivers/net/af_xdp/xdpsock_queue.h|  62 ++
 lib/librte_mbuf/rte_mbuf.c|  15 +-
 lib/librte_mbuf/rte_mbuf.h|   8 +-
 lib/librte_mempool/rte_mempool.c  |   2 +
 lib/librte_mempool/rte_mempool.h  |   1 +
 mk/rte.app.mk |   1 +
 19 files changed, 2125 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/bpf_load.c
 create mode 100644 drivers/net/af_xdp/bpf_load.h
 create mode 100644 drivers/net/af_xdp/libbpf.h
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
 create mode 100644 drivers/net/af_xdp/xdpsock_queue.h

-- 
2.13.6



[dpdk-dev] [RFC 1/7] net/af_xdp: new PMD driver

2018-02-27 Thread Qi Zhang
This is the vanilla version.
Packet data will copy between af_xdp memory buffer and mbuf mempool.
indexes of memory buffer is simply managed by a fifo ring.

Signed-off-by: Qi Zhang 
---
 config/common_base|   5 +
 config/common_linuxapp|   1 +
 drivers/net/Makefile  |   1 +
 drivers/net/af_xdp/Makefile   |  56 ++
 drivers/net/af_xdp/meson.build|   7 +
 drivers/net/af_xdp/rte_eth_af_xdp.c   | 763 ++
 drivers/net/af_xdp/rte_pmd_af_xdp_version.map |   4 +
 drivers/net/af_xdp/xdpsock_queue.h|  62 +++
 mk/rte.app.mk |   1 +
 9 files changed, 900 insertions(+)
 create mode 100644 drivers/net/af_xdp/Makefile
 create mode 100644 drivers/net/af_xdp/meson.build
 create mode 100644 drivers/net/af_xdp/rte_eth_af_xdp.c
 create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
 create mode 100644 drivers/net/af_xdp/xdpsock_queue.h

diff --git a/config/common_base b/config/common_base
index ad03cf433..84b7b3b7e 100644
--- a/config/common_base
+++ b/config/common_base
@@ -368,6 +368,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
 #
+# Compile software PMD backed by AF_XDP sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+
+#
 # Compile link bonding PMD library
 #
 CONFIG_RTE_LIBRTE_PMD_BOND=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index ff98f2355..3b10695b6 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -16,6 +16,7 @@ CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_VHOST_NUMA=y
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_AF_XDP=y
 CONFIG_RTE_LIBRTE_PMD_TAP=y
 CONFIG_RTE_LIBRTE_AVP_PMD=y
 CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index e1127326b..409234ac3 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -9,6 +9,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD),d)
 endif
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += af_xdp
 DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_AVF_PMD) += avf
 DIRS-$(CONFIG_RTE_LIBRTE_AVP_PMD) += avp
diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
new file mode 100644
index 0..ac38e20bf
--- /dev/null
+++ b/drivers/net/af_xdp/Makefile
@@ -0,0 +1,56 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2014 John W. Linville 
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   Copyright(c) 2014 6WIND S.A.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_af_xdp.a
+
+EXPORT_MAP := rte_pmd_af_xdp_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3 -I/opt/af_xdp/linux_headers/include
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vdev
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
new file mode 100644
index 0..4b5299c8e
--- /dev/null
+++ b/drivers/net/af_xdp/meson.build
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 I

[dpdk-dev] [RFC 3/7] lib/mempool: allow page size aligned mempool

2018-02-27 Thread Qi Zhang
Allow create a mempool with page size aligned base address.

Signed-off-by: Qi Zhang 
---
 lib/librte_mempool/rte_mempool.c | 2 ++
 lib/librte_mempool/rte_mempool.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 54f7f4ba4..f8d4814ad 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -567,6 +567,8 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_shift = 0; /* not needed, zone is physically contiguous */
pg_sz = 0;
align = RTE_CACHE_LINE_SIZE;
+   if (mp->flags & MEMPOOL_F_PAGE_ALIGN)
+   align = getpagesize();
} else {
pg_sz = getpagesize();
pg_shift = rte_bsf32(pg_sz);
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 8b1b7f7ed..774ab0f66 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -245,6 +245,7 @@ struct rte_mempool {
 #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is 
"single-consumer".*/
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous 
objs. */
+#define MEMPOOL_F_PAGE_ALIGN 0x0040 /**< Base address is page aligned. */
 /**
  * This capability flag is advertised by a mempool handler, if the whole
  * memory area containing the objects must be physically contiguous.
-- 
2.13.6



[dpdk-dev] [RFC 2/7] lib/mbuf: enable parse flags when create mempool

2018-02-27 Thread Qi Zhang
This give the option that applicaiton can configure each
memory chunk's size precisely. (by MEMPOOL_F_NO_SPREAD).

Signed-off-by: Qi Zhang 
---
 lib/librte_mbuf/rte_mbuf.c | 15 ---
 lib/librte_mbuf/rte_mbuf.h |  8 +++-
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 091d388d3..5fd91c87c 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -125,7 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
 struct rte_mempool * __rte_experimental
 rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-   int socket_id, const char *ops_name)
+   unsigned int flags, int socket_id, const char *ops_name)
 {
struct rte_mempool *mp;
struct rte_pktmbuf_pool_private mbp_priv;
@@ -145,7 +145,7 @@ rte_pktmbuf_pool_create_by_ops(const char *name, unsigned 
int n,
mbp_priv.mbuf_priv_size = priv_size;
 
mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
-sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+sizeof(struct rte_pktmbuf_pool_private), socket_id, flags);
if (mp == NULL)
return NULL;
 
@@ -179,9 +179,18 @@ rte_pktmbuf_pool_create(const char *name, unsigned int n,
int socket_id)
 {
return rte_pktmbuf_pool_create_by_ops(name, n, cache_size, priv_size,
-   data_room_size, socket_id, NULL);
+   data_room_size, 0, socket_id, NULL);
 }
 
+/* helper to create a mbuf pool with NO_SPREAD */
+struct rte_mempool *
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+   unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+   unsigned int flags, int socket_id)
+{
+   return rte_pktmbuf_pool_create_by_ops(name, n, cache_size, priv_size,
+   data_room_size, flags, socket_id, NULL);
+}
 /* do some sanity checks on a mbuf: panic if it fails */
 void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 62740254d..6f6af42a8 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1079,6 +1079,12 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
int socket_id);
 
+struct rte_mempool *
+rte_pktmbuf_pool_create_with_flags(const char *name, unsigned int n,
+   unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
+   unsigned flags, int socket_id);
+
+
 /**
  * Create a mbuf pool with a given mempool ops name
  *
@@ -1119,7 +1125,7 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
 struct rte_mempool * __rte_experimental
 rte_pktmbuf_pool_create_by_ops(const char *name, unsigned int n,
unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
-   int socket_id, const char *ops_name);
+   unsigned int flags, int socket_id, const char *ops_name);
 
 /**
  * Get the data room size of mbufs stored in a pktmbuf_pool
-- 
2.13.6



[dpdk-dev] [RFC 6/7] net/af_xdp: load BPF file

2018-02-27 Thread Qi Zhang
Add libbpf and libelf dependency in Makefile.
Durring initialization, bpf file "xdpsock_kern.o" will be loaded.
Then the driver will always try to link XDP fd with DRV mode first,
then SKB mode if failed in previoius.
Link will be released during dev_close.

Note: this is workaround solution, af_xdp may remove BPF dependency
in future.

Signed-off-by: Qi Zhang 
---
 drivers/net/af_xdp/Makefile |   6 +-
 drivers/net/af_xdp/bpf_load.c   | 798 
 drivers/net/af_xdp/bpf_load.h   |  65 +++
 drivers/net/af_xdp/libbpf.h | 199 +
 drivers/net/af_xdp/rte_eth_af_xdp.c |  31 +-
 mk/rte.app.mk   |   2 +-
 6 files changed, 1097 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/af_xdp/bpf_load.c
 create mode 100644 drivers/net/af_xdp/bpf_load.h
 create mode 100644 drivers/net/af_xdp/libbpf.h

diff --git a/drivers/net/af_xdp/Makefile b/drivers/net/af_xdp/Makefile
index ac38e20bf..a642786de 100644
--- a/drivers/net/af_xdp/Makefile
+++ b/drivers/net/af_xdp/Makefile
@@ -42,7 +42,10 @@ EXPORT_MAP := rte_pmd_af_xdp_version.map
 
 LIBABIVER := 1
 
-CFLAGS += -O3 -I/opt/af_xdp/linux_headers/include
+LINUX_HEADER_DIR := /opt/af_xdp/linux_headers/include
+TOOLS_DIR := /root/af_xdp/npg_dna-dna-linux/tools
+
+CFLAGS += -O3 -I$(LINUX_HEADER_DIR) -I$(TOOLS_DIR)/perf -I$(TOOLS_DIR)/include 
-Wno-error=sign-compare -Wno-error=cast-qual
 CFLAGS += $(WERROR_FLAGS)
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
@@ -52,5 +55,6 @@ LDLIBS += -lrte_bus_vdev
 # all source are stored in SRCS-y
 #
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += rte_eth_af_xdp.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_XDP) += bpf_load.c
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_xdp/bpf_load.c b/drivers/net/af_xdp/bpf_load.c
new file mode 100644
index 0..aa632207f
--- /dev/null
+++ b/drivers/net/af_xdp/bpf_load.c
@@ -0,0 +1,798 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "libbpf.h"
+#include "bpf_load.h"
+#include "perf-sys.h"
+
+#define DEBUGFS "/sys/kernel/debug/tracing/"
+
+static char license[128];
+static int kern_version;
+static bool processed_sec[128];
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+int map_fd[MAX_MAPS];
+int prog_fd[MAX_PROGS];
+int event_fd[MAX_PROGS];
+int prog_cnt;
+int prog_array_fd = -1;
+
+struct bpf_map_data map_data[MAX_MAPS];
+int map_data_count = 0;
+
+static int populate_prog_array(const char *event, int prog_fd)
+{
+   int ind = atoi(event), err;
+
+   err = bpf_map_update_elem(prog_array_fd, &ind, &prog_fd, BPF_ANY);
+   if (err < 0) {
+   printf("failed to store prog_fd in prog_array\n");
+   return -1;
+   }
+   return 0;
+}
+
+static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
+{
+   bool is_socket = strncmp(event, "socket", 6) == 0;
+   bool is_kprobe = strncmp(event, "kprobe/", 7) == 0;
+   bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0;
+   bool is_tracepoint = strncmp(event, "tracepoint/", 11) == 0;
+   bool is_xdp = strncmp(event, "xdp", 3) == 0;
+   bool is_perf_event = strncmp(event, "perf_event", 10) == 0;
+   bool is_cgroup_skb = strncmp(event, "cgroup/skb", 10) == 0;
+   bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
+   bool is_sockops = strncmp(event, "sockops", 7) == 0;
+   bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
+   size_t insns_cnt = size / sizeof(struct bpf_insn);
+   enum bpf_prog_type prog_type;
+   char buf[256];
+   int fd, efd, err, id;
+   struct perf_event_attr attr = {};
+
+   attr.type = PERF_TYPE_TRACEPOINT;
+   attr.sample_type = PERF_SAMPLE_RAW;
+   attr.sample_period = 1;
+   attr.wakeup_events = 1;
+
+   if (is_socket) {
+   prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
+   } else if (is_kprobe || is_kretprobe) {
+   prog_type = BPF_PROG_TYPE_KPROBE;
+   } else if (is_tracepoint) {
+   prog_type = BPF_PROG_TYPE_TRACEPOINT;
+   } else if (is_xdp) {
+   prog_type = BPF_PROG_TYPE_XDP;
+   } else if (is_perf_event) {
+   prog_type = BPF_PROG_TYPE_PERF_EVENT;
+   } else if (is_cgroup_skb) {
+   prog_type = BPF_PROG_TYPE_CGROUP_SKB;
+   } else if (is_cgroup_sk) {
+   prog_type = BPF_PROG_TYPE_CGROUP_SOCK;
+   } else if (is_sockops) {
+   prog_type = BPF_PROG_TYPE_SOCK_OPS;
+   } else if (is_sk_skb) {
+   prog_type = BPF_PROG_TYPE_SK_SKB;
+   } else {
+   printf("Unknown event '%s'\n", event);
+   return -1;
+   }
+
+  

[dpdk-dev] [RFC 7/7] app/testpmd: enable parameter for mempool flags

2018-02-27 Thread Qi Zhang
Now, it is possible for testpmd to create a af_xdp friendly mempool.

Signed-off-by: Qi Zhang 
---
 app/test-pmd/parameters.c | 12 
 app/test-pmd/testpmd.c| 15 +--
 app/test-pmd/testpmd.h|  1 +
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 97d22b860..19675671e 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -61,6 +61,7 @@ usage(char* progname)
   "--tx-first | --stats-period=PERIOD | "
   "--coremask=COREMASK --portmask=PORTMASK --numa "
   "--mbuf-size= | --total-num-mbufs= | "
+  "--mp-flags= | "
   "--nb-cores= | --nb-ports= | "
 #ifdef RTE_LIBRTE_CMDLINE
   "--eth-peers-configfile= | "
@@ -105,6 +106,7 @@ usage(char* progname)
printf("  --socket-num=N: set socket from which all memory is allocated 
"
   "in NUMA mode.\n");
printf("  --mbuf-size=N: set the data size of mbuf to N bytes.\n");
+   printf("  --mp-flags=N: set the flags when create mbuf memory pool.\n");
printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
   "in mbuf pools.\n");
printf("  --max-pkt-len=N: set the maximum size of packet to N 
bytes.\n");
@@ -568,6 +570,7 @@ launch_args_parse(int argc, char** argv)
{ "ring-numa-config",   1, 0, 0 },
{ "socket-num", 1, 0, 0 },
{ "mbuf-size",  1, 0, 0 },
+   { "mp-flags",   1, 0, 0 },
{ "total-num-mbufs",1, 0, 0 },
{ "max-pkt-len",1, 0, 0 },
{ "pkt-filter-mode",1, 0, 0 },
@@ -769,6 +772,15 @@ launch_args_parse(int argc, char** argv)
rte_exit(EXIT_FAILURE,
 "mbuf-size should be > 0 and < 
65536\n");
}
+   if (!strcmp(lgopts[opt_idx].name, "mp-flags")) {
+   n = atoi(optarg);
+   if (n > 0 && n <= 0x)
+   mp_flags = (uint16_t)n;
+   else
+   rte_exit(EXIT_FAILURE,
+"mp-flags should be > 0 and < 
65536\n");
+   }
+
if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) {
n = atoi(optarg);
if (n > 1024)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e2586c..887899919 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -171,6 +171,7 @@ uint32_t burst_tx_delay_time = BURST_TX_WAIT_US;
 uint32_t burst_tx_retry_num = BURST_TX_RETRIES;
 
 uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */
+uint16_t mp_flags = 0; /**< flags parsed when create mempool */
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
   * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -486,6 +487,7 @@ set_def_fwd_config(void)
  */
 static void
 mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
+unsigned int flags,
 unsigned int socket_id)
 {
char pool_name[RTE_MEMPOOL_NAMESIZE];
@@ -503,7 +505,7 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
mb_size, (unsigned) mb_mempool_cache,
sizeof(struct rte_pktmbuf_pool_private),
-   socket_id, 0);
+   socket_id, flags);
if (rte_mp == NULL)
goto err;
 
@@ -518,8 +520,8 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
/* wrapper to rte_mempool_create() */
TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
rte_mbuf_best_mempool_ops());
-   rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-   mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+   rte_mp = rte_pktmbuf_pool_create_with_flags(pool_name, nb_mbuf,
+   mb_mempool_cache, 0, mbuf_seg_size, flags, socket_id);
}
 
 err:
@@ -735,13 +737,14 @@ init_config(void)
 
for (i = 0; i < num_sockets; i++)
mbuf_pool_create(mbuf_data_size, nb_mbuf_per_pool,
-socket_ids[i]);
+mp_flags, socket_ids[i]);
} else {
if (socket_num == UMA_NO_CONFIG)
-   mbuf_pool_create(mbuf_d

[dpdk-dev] [RFC 5/7] net/af_xdp: enable share mempool

2018-02-27 Thread Qi Zhang
Try to check if external mempool (from rx_queue_setup) is fit for
af_xdp, if it is, it will be registered to af_xdp socket directly and
there will be no packet data copy on Rx and Tx.

Signed-off-by: Qi Zhang 
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 191 +++-
 1 file changed, 125 insertions(+), 66 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c 
b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 3c534c77c..d0939022b 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -60,7 +60,6 @@ struct xdp_umem {
unsigned int frame_size;
unsigned int frame_size_log2;
unsigned int nframes;
-   int mr_fd;
struct rte_mempool *mb_pool;
 };
 
@@ -73,6 +72,7 @@ struct pmd_internals {
struct xdp_queue tx;
struct xdp_umem *umem;
struct rte_mempool *ext_mb_pool;
+   uint8_t share_mb_pool;
 
unsigned long rx_pkts;
unsigned long rx_bytes;
@@ -162,20 +162,30 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
char *pkt;
uint32_t idx = descs[i].idx;
 
-   mbuf = rte_pktmbuf_alloc(internals->ext_mb_pool);
-   rte_pktmbuf_pkt_len(mbuf) =
-   rte_pktmbuf_data_len(mbuf) =
-   descs[i].len;
-   if (mbuf) {
-   pkt = get_pkt_data(internals, idx, descs[i].offset);
-   memcpy(rte_pktmbuf_mtod(mbuf, void *),
-  pkt, descs[i].len);
-   rx_bytes += descs[i].len;
-   bufs[count++] = mbuf;
+   if (!internals->share_mb_pool) {
+   mbuf = rte_pktmbuf_alloc(internals->ext_mb_pool);
+   rte_pktmbuf_pkt_len(mbuf) =
+   rte_pktmbuf_data_len(mbuf) =
+   descs[i].len;
+   if (mbuf) {
+   pkt = get_pkt_data(internals, idx,
+  descs[i].offset);
+   memcpy(rte_pktmbuf_mtod(mbuf, void *), pkt,
+  descs[i].len);
+   rx_bytes += descs[i].len;
+   bufs[count++] = mbuf;
+   } else {
+   dropped++;
+   }
+   rte_pktmbuf_free(idx_to_mbuf(internals, idx));
} else {
-   dropped++;
+   mbuf = idx_to_mbuf(internals, idx);
+   rte_pktmbuf_pkt_len(mbuf) =
+   rte_pktmbuf_data_len(mbuf) =
+   descs[i].len;
+   bufs[count++] = mbuf;
+   rx_bytes += descs[i].len;
}
-   rte_pktmbuf_free(idx_to_mbuf(internals, idx));
}
 
internals->rx_pkts += (rcvd - dropped);
@@ -209,51 +219,71 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
uint16_t i, valid;
unsigned long tx_bytes = 0;
int ret;
+   uint8_t share_mempool = 0;
 
nb_pkts = nb_pkts < ETH_AF_XDP_TX_BATCH_SIZE ?
  nb_pkts : ETH_AF_XDP_TX_BATCH_SIZE;
 
if (txq->num_free < ETH_AF_XDP_TX_BATCH_SIZE * 2) {
int n = xq_deq(txq, descs, ETH_AF_XDP_TX_BATCH_SIZE);
-
for (i = 0; i < n; i++)
rte_pktmbuf_free(idx_to_mbuf(internals, descs[i].idx));
}
 
nb_pkts = nb_pkts > txq->num_free ? txq->num_free : nb_pkts;
-   ret = rte_mempool_get_bulk(internals->umem->mb_pool,
-  (void *)mbufs,
-  nb_pkts);
-   if (ret)
+   if (nb_pkts == 0)
return 0;
 
+   if (bufs[0]->pool == internals->ext_mb_pool && internals->share_mb_pool)
+   share_mempool = 1;
+
+   if (!share_mempool) {
+   ret = rte_mempool_get_bulk(internals->umem->mb_pool,
+  (void *)mbufs,
+  nb_pkts);
+   if (ret)
+   return 0;
+   }
+
valid = 0;
for (i = 0; i < nb_pkts; i++) {
char *pkt;
-   unsigned int buf_len =
-   internals->umem->frame_size - ETH_AF_XDP_DATA_HEADROOM;
mbuf = bufs[i];
-   if (mbuf->pkt_len <= buf_len) {
-   descs[valid].idx = mbuf_to_idx(internals, mbufs[i]);
-   descs[valid].offset = ETH_AF_XDP_DATA_HEADROOM;
-   descs[valid].flags = 0;
-   descs[valid].len = mbuf->pkt_len;
-   pkt = get_pkt_data(internals, descs[i].idx,
-  descs[i].off

[dpdk-dev] [RFC 4/7] net/af_xdp: use mbuf mempool for buffer management

2018-02-27 Thread Qi Zhang
Now, af_xdp registered memory buffer is managed by rte_mempool.
mbuf be allocated from rte_mempool can be convert to descriptor
index and vice versa.

Signed-off-by: Qi Zhang 
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 165 +---
 1 file changed, 97 insertions(+), 68 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c 
b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 4eb8a2c28..3c534c77c 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -43,7 +43,11 @@
 
 #define ETH_AF_XDP_FRAME_SIZE  2048
 #define ETH_AF_XDP_NUM_BUFFERS 131072
-#define ETH_AF_XDP_DATA_HEADROOM   0
+/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
+#define ETH_AF_XDP_MBUF_OVERHEAD   192
+/* data start from offset 320 (192 + 128) bytes */
+#define ETH_AF_XDP_DATA_HEADROOM \
+   (ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
 #define ETH_AF_XDP_DFLT_RING_SIZE  1024
 #define ETH_AF_XDP_DFLT_QUEUE_IDX  0
 
@@ -57,6 +61,7 @@ struct xdp_umem {
unsigned int frame_size_log2;
unsigned int nframes;
int mr_fd;
+   struct rte_mempool *mb_pool;
 };
 
 struct pmd_internals {
@@ -67,7 +72,7 @@ struct pmd_internals {
struct xdp_queue rx;
struct xdp_queue tx;
struct xdp_umem *umem;
-   struct rte_mempool *mb_pool;
+   struct rte_mempool *ext_mb_pool;
 
unsigned long rx_pkts;
unsigned long rx_bytes;
@@ -80,7 +85,6 @@ struct pmd_internals {
uint16_t port_id;
uint16_t queue_idx;
int ring_size;
-   struct rte_ring *buf_ring;
 };
 
 static const char * const valid_arguments[] = {
@@ -106,6 +110,21 @@ static void *get_pkt_data(struct pmd_internals *internals,
   offset);
 }
 
+static uint32_t
+mbuf_to_idx(struct pmd_internals *internals, struct rte_mbuf *mbuf)
+{
+   return (uint32_t)(((uint64_t)mbuf->buf_addr -
+  (uint64_t)internals->umem->buffer) >>
+ internals->umem->frame_size_log2);
+}
+
+static struct rte_mbuf *
+idx_to_mbuf(struct pmd_internals *internals, uint32_t idx)
+{
+   return (struct rte_mbuf *)(void *)(internals->umem->buffer + (idx
+   << internals->umem->frame_size_log2) + 0x40);
+}
+
 static uint16_t
 eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 {
@@ -120,17 +139,18 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
  nb_pkts : ETH_AF_XDP_RX_BATCH_SIZE;
 
struct xdp_desc descs[ETH_AF_XDP_RX_BATCH_SIZE];
-   void *indexes[ETH_AF_XDP_RX_BATCH_SIZE];
+   struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE];
int rcvd, i;
/* fill rx ring */
if (rxq->num_free >= ETH_AF_XDP_RX_BATCH_SIZE) {
-   int n = rte_ring_dequeue_bulk(internals->buf_ring,
- indexes,
- ETH_AF_XDP_RX_BATCH_SIZE,
- NULL);
-   for (i = 0; i < n; i++)
-   descs[i].idx = (uint32_t)((long int)indexes[i]);
-   xq_enq(rxq, descs, n);
+   int ret = rte_mempool_get_bulk(internals->umem->mb_pool,
+(void *)mbufs,
+ETH_AF_XDP_RX_BATCH_SIZE);
+   if (!ret) {
+   for (i = 0; i < ETH_AF_XDP_RX_BATCH_SIZE; i++)
+   descs[i].idx = mbuf_to_idx(internals, mbufs[i]);
+   xq_enq(rxq, descs, ETH_AF_XDP_RX_BATCH_SIZE);
+   }
}
 
/* read data */
@@ -142,7 +162,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t 
nb_pkts)
char *pkt;
uint32_t idx = descs[i].idx;
 
-   mbuf = rte_pktmbuf_alloc(internals->mb_pool);
+   mbuf = rte_pktmbuf_alloc(internals->ext_mb_pool);
rte_pktmbuf_pkt_len(mbuf) =
rte_pktmbuf_data_len(mbuf) =
descs[i].len;
@@ -155,11 +175,9 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
} else {
dropped++;
}
-   indexes[i] = (void *)((long int)idx);
+   rte_pktmbuf_free(idx_to_mbuf(internals, idx));
}
 
-   rte_ring_enqueue_bulk(internals->buf_ring, indexes, rcvd, NULL);
-
internals->rx_pkts += (rcvd - dropped);
internals->rx_bytes += rx_bytes;
internals->rx_dropped += dropped;
@@ -187,9 +205,10 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
struct xdp_queue *txq = &internals->tx;
struct rte_mbuf *mbuf;
struct xdp_desc descs[ETH_AF_XDP_TX_BATCH_SIZE];
-   void *indexes[ETH_AF_XDP_TX_BATCH_SIZE];
+   struct rte_mbuf *mbufs[ETH_AF_XDP_TX_

Re: [dpdk-dev] [PATCH v2] ether: fix invalid string length in ethdev name comparison

2018-02-27 Thread Ananyev, Konstantin


> -Original Message-
> From: Awal, Mohammad Abdul
> Sent: Tuesday, February 27, 2018 8:58 AM
> To: tho...@monjalon.net
> Cc: rke...@gmail.com; dev@dpdk.org; Ananyev, Konstantin 
> ; Awal, Mohammad Abdul
> 
> Subject: [PATCH v2] ether: fix invalid string length in ethdev name comparison
> 
> The current code compares two strings upto the length of 1st string
> (searched name). If the 1st string is prefix of 2nd string (existing name),
> the string comparison returns the port_id of earliest prefix matches.
> This patch fixes the bug by using strcmp instead of strncmp.
> 
> Fixes: 9c5b8d8b9fe ("ethdev: clean port id retrieval when attaching")
> 
> Signed-off-by: Mohammad Abdul Awal 
> ---
>  lib/librte_ether/rte_ethdev.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 0590f0c..3b885a6 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -572,8 +572,7 @@ rte_eth_dev_get_port_by_name(const char *name, uint16_t 
> *port_id)
> 
>   for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
>   if (rte_eth_devices[pid].state != RTE_ETH_DEV_UNUSED &&
> - !strncmp(name, rte_eth_dev_shared_data->data[pid].name,
> -  strlen(name))) {
> + !strcmp(name, rte_eth_dev_shared_data->data[pid].name)) {
>   *port_id = pid;
>   return 0;
>   }
> --

Acked-by: Konstantin Ananyev 

> 2.7.4



[dpdk-dev] NXP's DPDK Roadmap for 18.05

2018-02-27 Thread Shreyansh Jain
Following are some of the key work items which we are planning for DPDK 
1805:


1. Two rawdevice drivers
- for communication interface to interact with a specialized block 
(AIOP: Advanced I/O Processor) on dpaa2.

- and, for DMA accelerator support on dpaa2 devices to userspace

2. Crypto adapter for eventdev for dpaa1/dpaa2.
- This directly dependent on the crypto adapter base code [1] getting 
finalized within the window.


3. Change to rte_eth_dev_info for supporting driver advertised burst 
size as described in deprecation notice [2].
- this would enable applications to tune their bursts according to the 
driver's preferred size, allowing a knob to tune into higher device 
specific performance.


4. Meson support
- this includes dpaa1/dpaa2 components (bus/mempool/net/crypto/event)
- and for lib_rawdev

5. Integrating dpaa1/dpaa2 compilation into common_linuxapp 
configuration to allow dpaa1/dpaa2 compilation as default objects along 
with other PMDs.


6. A new ethernet driver for ENETC (using PCI bus) for NXP LS1028 SoC.

Various other trivial activities include:
-) Dynamic Logging for dpaa2 and some changes for dpaa1 as well for 
cleaning up logs.
-) Device whitelist/blacklist support for dpaa1/dpaa2 devices. This is 
dependent on the devargs work being done. [2]

-) Ethdev offload API support
-) coverity reported fixes

Flow director support for dpaa2 is also on the cards, but probably that 
would be targeted only on best-effort basis.


===
[1] http://dpdk.org/ml/archives/dev/2018-February/090807.html - RFC for 
crypto adapter for event devices
[2] http://dpdk.org/ml/archives/dev/2018-February/090157.html - 
deprecation notice for burst size change


-
Shreyansh


Re: [dpdk-dev] [PATCH 1/5] lib/ethdev: support for inline IPsec events

2018-02-27 Thread Nicolau, Radu

> -Original Message-
> From: Anoob Joseph [mailto:anoob.jos...@caviumnetworks.com]
> Sent: Tuesday, February 27, 2018 6:57 AM
> To: Nicolau, Radu ; Akhil Goyal
> ; Doherty, Declan 
> Cc: Jerin Jacob ; Narayana Prasad
> ; Nelio Laranjeiro
> ; dev@dpdk.org
> Subject: Re: [PATCH 1/5] lib/ethdev: support for inline IPsec events
> 
> Hi Radu,
> 
> Please see inline.
> 
> Thanks,
> Anoob
> 
> On 26/02/18 15:05, Nicolau, Radu wrote:
> >
> >> -Original Message-
> >> From: Anoob Joseph [mailto:anoob.jos...@caviumnetworks.com]
> >> Sent: Wednesday, February 21, 2018 5:37 AM
> >> To: Akhil Goyal ; Doherty, Declan
> >> ; Nicolau, Radu 
> >> Cc: Anoob Joseph ; Jerin Jacob
> >> ; Narayana Prasad
> >> ; Nelio Laranjeiro
> >> ; dev@dpdk.org
> >> Subject: [PATCH 1/5] lib/ethdev: support for inline IPsec events
> >>
> >> Adding support for IPsec events in rte_eth_event framework. In inline
> >> IPsec offload, the per packet protocol defined variables, like ESN,
> >> would be managed by PMD. In such cases, PMD would need IPsec events
> >> to notify application about various conditions like, ESN overflow.
> >>
> >> Signed-off-by: Anoob Joseph 
> >> ---
> >>   lib/librte_ether/rte_ethdev.h | 22 ++
> >>   1 file changed, 22 insertions(+)
> >>
> >> diff --git a/lib/librte_ether/rte_ethdev.h
> >> b/lib/librte_ether/rte_ethdev.h index 0361533..4e4e18d 100644
> >> --- a/lib/librte_ether/rte_ethdev.h
> >> +++ b/lib/librte_ether/rte_ethdev.h
> >> @@ -2438,6 +2438,27 @@ int
> >>   rte_eth_tx_done_cleanup(uint16_t port_id, uint16_t queue_id,
> >> uint32_t free_cnt);
> >>
> >>   /**
> >> + * Subtypes for IPsec offload events raised by eth device.
> >> + */
> >> +enum rte_eth_event_ipsec_subtype {
> >> +  RTE_ETH_EVENT_IPSEC_ESN_OVERFLOW,
> >> +  /** Sequence number overflow in security offload */
> >> +  RTE_ETH_EVENT_IPSEC_MAX
> >> +  /** Max value of this enum */
> >> +};
> > I would add some more events to the list (to make it look less like a very
> specific case implementation): crypto/auth failed and undefined/unspecified
> being the most obvious.
> > Apart from this, the patchset looks fine.
> Understood your point. But crypto/auth failed would be per packet, right?
> How are we handling such error cases presently? Just want to make sure we
> are not adding two error reporting mechanisms.
The only reason for my suggestion was to keep the API as flexible and generic 
as possible.
For the inline crypto on ixgbe we only flag the mbuf with the security error 
flag, but no extra info is added. I guess we can have a ipsec crypto error 
event with a list of failed mbufs or similar. In any case, it's just a 
suggestion.

> >
> >> +
> >> +/**
> >> + * Descriptor for IPsec event. Used by eth dev to send extra
> >> +information of the
> >> + * event.
> >> + */
> >> +struct rte_eth_event_ipsec_desc {
> >> +  enum rte_eth_event_ipsec_subtype stype;
> >> +  /** Type of IPsec event */
> >> +  uint64_t md;
> >> +  /** Event specific metadata */
> >> +};
> >> +
> >> +/**
> >>* The eth device event type for interrupt, and maybe others in the
> future.
> >>*/
> >>   enum rte_eth_event_type {
> >> @@ -2448,6 +2469,7 @@ enum rte_eth_event_type {
> >>RTE_ETH_EVENT_INTR_RESET,
> >>/**< reset interrupt event, sent to VF on PF reset */
> >>RTE_ETH_EVENT_VF_MBOX,  /**< message from the VF received by
> PF */
> >> +  RTE_ETH_EVENT_IPSEC,/**< IPsec offload related event */
> >>RTE_ETH_EVENT_MACSEC,   /**< MACsec offload related event */
> >>RTE_ETH_EVENT_INTR_RMV, /**< device removal event */
> >>RTE_ETH_EVENT_NEW,  /**< port is probed */
> >> --
> >> 2.7.4



[dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function

2018-02-27 Thread Kirill Rybalchenko
In 18.02 release the ABI of ethdev component was changed.
To keep compatibility with previous versions of the library
the versioning of rte_eth_dev_filter_ctrl function was implemented.
As soon as deprecation note was issued in 18.02 release, there is
no need to keep compatibility with previous versions.
Remove the versioning of rte_eth_dev_filter_ctrl function.

Signed-off-by: Kirill Rybalchenko 
---
 lib/librte_ether/rte_ethdev.c | 155 +-
 1 file changed, 2 insertions(+), 153 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 0590f0c..78b8376 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -34,7 +34,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "rte_ether.h"
 #include "rte_ethdev.h"
@@ -3490,153 +3489,8 @@ rte_eth_dev_filter_supported(uint16_t port_id,
 }
 
 int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-   enum rte_filter_type filter_type,
-   enum rte_filter_op filter_op, void *arg);
-
-int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-   enum rte_filter_type filter_type,
-   enum rte_filter_op filter_op, void *arg)
-{
-   struct rte_eth_fdir_info_v22 {
-   enum rte_fdir_mode mode;
-   struct rte_eth_fdir_masks mask;
-   struct rte_eth_fdir_flex_conf flex_conf;
-   uint32_t guarant_spc;
-   uint32_t best_spc;
-   uint32_t flow_types_mask[1];
-   uint32_t max_flexpayload;
-   uint32_t flex_payload_unit;
-   uint32_t max_flex_payload_segment_num;
-   uint16_t flex_payload_limit;
-   uint32_t flex_bitmask_unit;
-   uint32_t max_flex_bitmask_num;
-   };
-
-   struct rte_eth_hash_global_conf_v22 {
-   enum rte_eth_hash_function hash_func;
-   uint32_t sym_hash_enable_mask[1];
-   uint32_t valid_bit_mask[1];
-   };
-
-   struct rte_eth_hash_filter_info_v22 {
-   enum rte_eth_hash_filter_info_type info_type;
-   union {
-   uint8_t enable;
-   struct rte_eth_hash_global_conf_v22 global_conf;
-   struct rte_eth_input_set_conf input_set_conf;
-   } info;
-   };
-
-   struct rte_eth_dev *dev;
-
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-   dev = &rte_eth_devices[port_id];
-   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->filter_ctrl, -ENOTSUP);
-   if (filter_op == RTE_ETH_FILTER_INFO) {
-   int retval;
-   struct rte_eth_fdir_info_v22 *fdir_info_v22;
-   struct rte_eth_fdir_info fdir_info;
-
-   fdir_info_v22 = (struct rte_eth_fdir_info_v22 *)arg;
-
-   retval = (*dev->dev_ops->filter_ctrl)(dev, filter_type,
- filter_op, (void *)&fdir_info);
-   fdir_info_v22->mode = fdir_info.mode;
-   fdir_info_v22->mask = fdir_info.mask;
-   fdir_info_v22->flex_conf = fdir_info.flex_conf;
-   fdir_info_v22->guarant_spc = fdir_info.guarant_spc;
-   fdir_info_v22->best_spc = fdir_info.best_spc;
-   fdir_info_v22->flow_types_mask[0] =
-   (uint32_t)fdir_info.flow_types_mask[0];
-   fdir_info_v22->max_flexpayload = fdir_info.max_flexpayload;
-   fdir_info_v22->flex_payload_unit = fdir_info.flex_payload_unit;
-   fdir_info_v22->max_flex_payload_segment_num =
-   fdir_info.max_flex_payload_segment_num;
-   fdir_info_v22->flex_payload_limit =
-   fdir_info.flex_payload_limit;
-   fdir_info_v22->flex_bitmask_unit = fdir_info.flex_bitmask_unit;
-   fdir_info_v22->max_flex_bitmask_num =
-   fdir_info.max_flex_bitmask_num;
-   return retval;
-   } else if (filter_op == RTE_ETH_FILTER_GET) {
-   int retval;
-   struct rte_eth_hash_filter_info f_info;
-   struct rte_eth_hash_filter_info_v22 *f_info_v22 =
-   (struct rte_eth_hash_filter_info_v22 *)arg;
-
-   f_info.info_type = f_info_v22->info_type;
-   retval = (*dev->dev_ops->filter_ctrl)(dev, filter_type,
- filter_op, (void *)&f_info);
-
-   switch (f_info_v22->info_type) {
-   case RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT:
-   f_info_v22->info.enable = f_info.info.enable;
-   break;
-   case RTE_ETH_HASH_FILTER_GLOBAL_CONFIG:
-   f_info_v22->info.global_conf.hash_func =
-   f_info.info.global_conf.hash_func;
-   f_info_v22->info.global_conf.sym_hash_enable_mask[0] =
-  

Re: [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function

2018-02-27 Thread Ferruh Yigit
On 2/27/2018 10:29 AM, Kirill Rybalchenko wrote:
> In 18.02 release the ABI of ethdev component was changed.
> To keep compatibility with previous versions of the library
> the versioning of rte_eth_dev_filter_ctrl function was implemented.
> As soon as deprecation note was issued in 18.02 release, there is
> no need to keep compatibility with previous versions.
> Remove the versioning of rte_eth_dev_filter_ctrl function.
> 
> Signed-off-by: Kirill Rybalchenko 
> ---
>  lib/librte_ether/rte_ethdev.c | 155 
> +-

Hi Kirill,

You need to update .map file and removed deprecation notice in this patch.

Thanks,
ferruh


[dpdk-dev] [PATCH] maintainers: resign from GSO lib maintenance

2018-02-27 Thread Mark Kavanagh
I will not be directly working on the DPDK project anymore.

Signed-off-by: Mark Kavanagh 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a646ca3..8fa79b7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -792,7 +792,6 @@ F: doc/guides/prog_guide/generic_receive_offload_lib.rst
 
 Generic Segmentation Offload
 M: Jiayu Hu 
-M: Mark Kavanagh 
 F: lib/librte_gso/
 F: doc/guides/prog_guide/generic_segmentation_offload_lib.rst
 
-- 
1.9.3



Re: [dpdk-dev] [RFC 1/3] vhost: invalidate vring addresses in cleanup_vq()

2018-02-27 Thread Jens Freimann

On Thu, Feb 22, 2018 at 07:19:08PM +0100, Maxime Coquelin wrote:

When cleaning-up the virtqueue, we also need to invalidate its
addresses to be sure outdated addresses won't be used later.

Signed-off-by: Maxime Coquelin 
---
lib/librte_vhost/vhost.c  | 6 --
lib/librte_vhost/vhost.h  | 4 +++-
lib/librte_vhost/vhost_user.c | 2 +-
3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index f6f12a03b..e4281cf67 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -69,12 +69,14 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
}

void
-cleanup_vq(struct vhost_virtqueue *vq, int destroy)
+cleanup_vq(struct virtio_net *dev, struct vhost_virtqueue *vq, int destroy)
{
if ((vq->callfd >= 0) && (destroy != 0))
close(vq->callfd);
if (vq->kickfd >= 0)
close(vq->kickfd);
+
+   vring_invalidate(dev, vq);
}

/*
@@ -89,7 +91,7 @@ cleanup_device(struct virtio_net *dev, int destroy)
vhost_backend_cleanup(dev);

for (i = 0; i < dev->nr_vring; i++)
-   cleanup_vq(dev->virtqueue[i], destroy);
+   cleanup_vq(dev, dev->virtqueue[i], destroy);
}

void
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 58aec2e0d..4ebf84bec 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -241,6 +241,7 @@ struct virtio_net {
struct guest_page   *guest_pages;

int slave_req_fd;
+   uint8_t virtio_status;


Belongs into other patch?

Apart from that 
Reviewed-by: Jens Freimann  


regards,
Jens 


Re: [dpdk-dev] [PATCH 1/5] lib/ethdev: support for inline IPsec events

2018-02-27 Thread Anoob Joseph

Hi Radu,

Please see inline.

Thanks,

Anoob


On 27/02/18 15:49, Nicolau, Radu wrote:

-Original Message-
From: Anoob Joseph [mailto:anoob.jos...@caviumnetworks.com]
Sent: Tuesday, February 27, 2018 6:57 AM
To: Nicolau, Radu ; Akhil Goyal
; Doherty, Declan 
Cc: Jerin Jacob ; Narayana Prasad
; Nelio Laranjeiro
; dev@dpdk.org
Subject: Re: [PATCH 1/5] lib/ethdev: support for inline IPsec events

Hi Radu,

Please see inline.

Thanks,
Anoob

On 26/02/18 15:05, Nicolau, Radu wrote:

-Original Message-
From: Anoob Joseph [mailto:anoob.jos...@caviumnetworks.com]
Sent: Wednesday, February 21, 2018 5:37 AM
To: Akhil Goyal ; Doherty, Declan
; Nicolau, Radu 
Cc: Anoob Joseph ; Jerin Jacob
; Narayana Prasad
; Nelio Laranjeiro
; dev@dpdk.org
Subject: [PATCH 1/5] lib/ethdev: support for inline IPsec events

Adding support for IPsec events in rte_eth_event framework. In inline
IPsec offload, the per packet protocol defined variables, like ESN,
would be managed by PMD. In such cases, PMD would need IPsec events
to notify application about various conditions like, ESN overflow.

Signed-off-by: Anoob Joseph 
---
   lib/librte_ether/rte_ethdev.h | 22 ++
   1 file changed, 22 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h
b/lib/librte_ether/rte_ethdev.h index 0361533..4e4e18d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2438,6 +2438,27 @@ int
   rte_eth_tx_done_cleanup(uint16_t port_id, uint16_t queue_id,
uint32_t free_cnt);

   /**
+ * Subtypes for IPsec offload events raised by eth device.
+ */
+enum rte_eth_event_ipsec_subtype {
+   RTE_ETH_EVENT_IPSEC_ESN_OVERFLOW,
+   /** Sequence number overflow in security offload */
+   RTE_ETH_EVENT_IPSEC_MAX
+   /** Max value of this enum */
+};

I would add some more events to the list (to make it look less like a very

specific case implementation): crypto/auth failed and undefined/unspecified
being the most obvious.

Apart from this, the patchset looks fine.

Understood your point. But crypto/auth failed would be per packet, right?
How are we handling such error cases presently? Just want to make sure we
are not adding two error reporting mechanisms.

The only reason for my suggestion was to keep the API as flexible and generic 
as possible.

I agree to your suggestion.

For the inline crypto on ixgbe we only flag the mbuf with the security error 
flag, but no extra info is added. I guess we can have a ipsec crypto error 
event with a list of failed mbufs or similar. In any case, it's just a 
suggestion.
Do you think having a crypto error with failed mbufs would be useful? If 
yes, I can add that. While considering other SA specific events, there 
could be two other such events that we may need to consider.

1) Byte expiry of SA [1]
2) Time expiry of SA [1]

Shall I add these events? Or do we need to make that a separate patch? 
Considering that it would need an entry in conf for actually of any use.


[1] https://tools.ietf.org/html/rfc4301#page-37



+
+/**
+ * Descriptor for IPsec event. Used by eth dev to send extra
+information of the
+ * event.
+ */
+struct rte_eth_event_ipsec_desc {
+   enum rte_eth_event_ipsec_subtype stype;
+   /** Type of IPsec event */
+   uint64_t md;
+   /** Event specific metadata */
+};
+
+/**
* The eth device event type for interrupt, and maybe others in the

future.

*/
   enum rte_eth_event_type {
@@ -2448,6 +2469,7 @@ enum rte_eth_event_type {
RTE_ETH_EVENT_INTR_RESET,
/**< reset interrupt event, sent to VF on PF reset */
RTE_ETH_EVENT_VF_MBOX,  /**< message from the VF received by

PF */

+   RTE_ETH_EVENT_IPSEC,/**< IPsec offload related event */
RTE_ETH_EVENT_MACSEC,   /**< MACsec offload related event */
RTE_ETH_EVENT_INTR_RMV, /**< device removal event */
RTE_ETH_EVENT_NEW,  /**< port is probed */
--
2.7.4




Re: [dpdk-dev] [RFC 1/3] vhost: invalidate vring addresses in cleanup_vq()

2018-02-27 Thread Maxime Coquelin

Hi Jens,

On 02/27/2018 12:22 PM, Jens Freimann wrote:

On Thu, Feb 22, 2018 at 07:19:08PM +0100, Maxime Coquelin wrote:

When cleaning-up the virtqueue, we also need to invalidate its
addresses to be sure outdated addresses won't be used later.

Signed-off-by: Maxime Coquelin 
---
lib/librte_vhost/vhost.c  | 6 --
lib/librte_vhost/vhost.h  | 4 +++-
lib/librte_vhost/vhost_user.c | 2 +-
3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index f6f12a03b..e4281cf67 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -69,12 +69,14 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct 
vhost_virtqueue *vq,

}

void
-cleanup_vq(struct vhost_virtqueue *vq, int destroy)
+cleanup_vq(struct virtio_net *dev, struct vhost_virtqueue *vq, int 
destroy)

{
if ((vq->callfd >= 0) && (destroy != 0))
    close(vq->callfd);
if (vq->kickfd >= 0)
    close(vq->kickfd);
+
+    vring_invalidate(dev, vq);
}

/*
@@ -89,7 +91,7 @@ cleanup_device(struct virtio_net *dev, int destroy)
vhost_backend_cleanup(dev);

for (i = 0; i < dev->nr_vring; i++)
-    cleanup_vq(dev->virtqueue[i], destroy);
+    cleanup_vq(dev, dev->virtqueue[i], destroy);
}

void
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 58aec2e0d..4ebf84bec 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -241,6 +241,7 @@ struct virtio_net {
struct guest_page   *guest_pages;

int    slave_req_fd;
+    uint8_t    virtio_status;


Belongs into other patch?


Oh, right! I squashed in wrong commit.


Apart from that Reviewed-by: Jens Freimann 


Is your r-b for the full series or this single patch?

Thanks!
Maxime

regards,
Jens


Re: [dpdk-dev] [RFC 1/3] vhost: invalidate vring addresses in cleanup_vq()

2018-02-27 Thread Jens Freimann

On Tue, Feb 27, 2018 at 12:44:08PM +0100, Maxime Coquelin wrote:

Hi Jens,

On 02/27/2018 12:22 PM, Jens Freimann wrote:

On Thu, Feb 22, 2018 at 07:19:08PM +0100, Maxime Coquelin wrote:

[...]

int?? slave_req_fd;
+?? uint8_t?? virtio_status;


Belongs into other patch?


Oh, right! I squashed in wrong commit.


Apart from that Reviewed-by: Jens Freimann 


Is your r-b for the full series or this single patch?


For this one, but I'll review the other patches today as well. 


regards,
Jens 


Thanks!
Maxime

regards,
Jens


[dpdk-dev] [Bug 17] vhost example VLAN offloading not working on igb tx

2018-02-27 Thread bugzilla
https://dpdk.org/tracker/show_bug.cgi?id=17

Bug ID: 17
   Summary: vhost example VLAN offloading not working on igb tx
   Product: DPDK
   Version: 17.05
  Hardware: x86
OS: Linux
Status: CONFIRMED
  Severity: normal
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: henning.sch...@siemens.com
  Target Milestone: ---

The igb driver does not send any packets when requesting PKT_TX_VLAN_PKT. While
the xmit-function returns success (number of tx pkts) no packets actually leave
the NIC.


Steps to reproduce:
Get an Intel I350 and run the vhost-example application, like that

  ./vhost-switch -c 3 -n 1 -m 1400 -w :04:00.0 -- -p 1  --socket-file
/tmp/sock0

Attach a qemu and send a few pkts.

Expected Result:
tcpdump on a remote machine (direct cable connection) will see the pkts comming
in.

Actual Result:
No Packets leaving the NIC.

Additional Information:
lspci of the NIC
04:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
(rev 01)
Flags: fast devsel, IRQ 16
Memory at caa2 (32-bit, non-prefetchable) [disabled] [size=128K]
I/O ports at 3020 [disabled] [size=32]
Memory at caa44000 (32-bit, non-prefetchable) [disabled] [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] MSI-X: Enable- Count=10 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 00-0a-cd-ff-ff-2d-ca-dc
Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
Capabilities: [1a0] Transaction Processing Hints
Capabilities: [1c0] Latency Tolerance Reporting
Capabilities: [1d0] Access Control Services
Kernel driver in use: vfio-pci
Kernel modules: igb

Doing the VLAN-tagging in SW works as expected. Modified vhost to not set
m->ol_flags |= PKT_TX_VLAN_PKT and instead called rte_vlan_insert(&m);

-- 
You are receiving this mail because:
You are the assignee for the bug.

[dpdk-dev] [PATCH 00/14] net/sfc: support flow API for tunnels

2018-02-27 Thread Andrew Rybchenko
Update base driver and the PMD itself to support flow API
patterns for tunnels: VXLAN, NVGRE and Geneve.

Applicable to SFN8xxx NICs with full-feature firmware variant running.

Andrew Rybchenko (1):
  doc: add net/sfc flow API support for tunnels

Roman Zhukov (12):
  net/sfc/base: support filters for encapsulated packets
  net/sfc/base: support VNI/VSID and inner frame local MAC
  net/sfc/base: distinguish filters for encapsulated packets
  net/sfc: add VXLAN in flow API filters support
  net/sfc: add NVGRE in flow API filters support
  net/sfc: add GENEVE in flow API filters support
  net/sfc: add inner frame ETH in flow API filters support
  net/sfc: add infrastructure to make many filters from flow
  net/sfc: multiply of specs with an unknown EtherType
  net/sfc: multiply of specs w/o inner frame destination MAC
  net/sfc: multiply of specs with an unknown destination MAC
  net/sfc: avoid creation of ineffective flow rules

Vijay Srivastava (1):
  net/sfc/base: support VXLAN filter creation

 doc/guides/nics/sfc_efx.rst|   28 +-
 doc/guides/rel_notes/release_18_05.rst |6 +
 drivers/net/sfc/base/ef10_filter.c |  100 +++-
 drivers/net/sfc/base/efx.h |   20 +
 drivers/net/sfc/base/efx_filter.c  |   39 +-
 drivers/net/sfc/sfc_flow.c | 1001 ++--
 drivers/net/sfc/sfc_flow.h |   19 +-
 7 files changed, 1161 insertions(+), 52 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH 04/14] net/sfc/base: distinguish filters for encapsulated packets

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

Add filter match flag to distinguish filters applied only to
encapsulated packets.

Match flags set should allow to determine whether a filter
is supported or not. The problem is that if specification
has supported set outer match flags and specified
encapsulation without any inner flags, check says that it
is supported, and filter insertion is performed. However,
there is no filtering of the encapsulated traffic. A new
flag is added to solve this problem and separate the
filters for the encapsulated packets.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Mark Spender 
---
 drivers/net/sfc/base/ef10_filter.c | 19 +--
 drivers/net/sfc/base/efx.h |  5 +
 drivers/net/sfc/base/efx_filter.c  |  3 ++-
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sfc/base/ef10_filter.c 
b/drivers/net/sfc/base/ef10_filter.c
index 8026b1a..54e1c35 100644
--- a/drivers/net/sfc/base/ef10_filter.c
+++ b/drivers/net/sfc/base/ef10_filter.c
@@ -172,6 +172,7 @@ efx_mcdi_filter_op_add(
efx_mcdi_req_t req;
uint8_t payload[MAX(MC_CMD_FILTER_OP_EXT_IN_LEN,
MC_CMD_FILTER_OP_EXT_OUT_LEN)];
+   efx_filter_match_flags_t match_flags;
efx_rc_t rc;
 
memset(payload, 0, sizeof (payload));
@@ -181,6 +182,12 @@ efx_mcdi_filter_op_add(
req.emr_out_buf = payload;
req.emr_out_length = MC_CMD_FILTER_OP_EXT_OUT_LEN;
 
+   /*
+* Remove match flag for encapsulated filters that does not correspond
+* to the MCDI match flags
+*/
+   match_flags = spec->efs_match_flags & ~EFX_FILTER_MATCH_ENCAP_TYPE;
+
switch (filter_op) {
case MC_CMD_FILTER_OP_IN_OP_REPLACE:
MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_HANDLE_LO,
@@ -201,7 +208,7 @@ efx_mcdi_filter_op_add(
MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_PORT_ID,
EVB_PORT_ID_ASSIGNED);
MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_MATCH_FIELDS,
-   spec->efs_match_flags);
+   match_flags);
MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_RX_DEST,
MC_CMD_FILTER_OP_EXT_IN_RX_DEST_HOST);
MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_RX_QUEUE,
@@ -1003,13 +1010,17 @@ ef10_filter_supported_filters(
EFX_FILTER_MATCH_IFRM_LOC_MAC |
EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST |
EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST |
+   EFX_FILTER_MATCH_ENCAP_TYPE |
EFX_FILTER_MATCH_UNKNOWN_MCAST_DST |
EFX_FILTER_MATCH_UNKNOWN_UCAST_DST);
 
/*
 * Two calls to MC_CMD_GET_PARSER_DISP_INFO are needed: one to get the
 * list of supported filters for ordinary packets, and then another to
-* get the list of supported filters for encapsulated packets.
+* get the list of supported filters for encapsulated packets. To
+* distinguish the second list from the first, the
+* EFX_FILTER_MATCH_ENCAP_TYPE flag is added to each filter for
+* encapsulated packets.
 */
rc = efx_mcdi_get_parser_disp_info(enp, buffer, buffer_length, B_FALSE,
&mcdi_list_length);
@@ -1037,6 +1048,10 @@ ef10_filter_supported_filters(
no_space = B_TRUE;
else
goto fail2;
+   } else {
+   for (i = next_buf_idx;
+   i < next_buf_idx + mcdi_encap_list_length; i++)
+   buffer[i] |= EFX_FILTER_MATCH_ENCAP_TYPE;
}
} else {
mcdi_encap_list_length = 0;
diff --git a/drivers/net/sfc/base/efx.h b/drivers/net/sfc/base/efx.h
index c0e4218..df1e23a 100644
--- a/drivers/net/sfc/base/efx.h
+++ b/drivers/net/sfc/base/efx.h
@@ -2323,6 +2323,11 @@ typedef uint8_t efx_filter_flags_t;
 #defineEFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST 0x0100
 /* For encapsulated packets, match all unicast inner frames */
 #defineEFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST 0x0200
+/*
+ * Match by encap type, this flag does not correspond to
+ * the MCDI match flags and any unoccupied value may be used
+ */
+#defineEFX_FILTER_MATCH_ENCAP_TYPE 0x2000
 /* Match otherwise-unmatched multicast and broadcast packets */
 #defineEFX_FILTER_MATCH_UNKNOWN_MCAST_DST  0x4000
 /* Match otherwise-unmatched unicast packets */
diff --git a/drivers/net/sfc/base/efx_filter.c 
b/drivers/net/sfc/base/efx_filter.c
index 4bce050..a37b5c1 100644
--- a/drivers/net/sfc/base/efx_filter.c
+++ b/drivers/net/sfc/base/efx_filter.c
@@ -412,7 +412,7 @@ efx_filter_spec_set_encap_type(
__inefx_tunnel_protocol_t encap_type,
__inefx_filter_inner_frame_match_t inner_frame_match)
 {
-   uint32_t match_flags = 0;
+   uint32_t match_flags = EFX_FILTER_MATCH_ENCAP_TYPE;

[dpdk-dev] [PATCH 06/14] net/sfc: add NVGRE in flow API filters support

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

Exact match of virtual subnet ID is supported by parser.
IP protocol match are enforced to GRE.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Ivan Malov 
Reviewed-by: Andy Moreton 
---
 doc/guides/nics/sfc_efx.rst |  2 ++
 drivers/net/sfc/sfc_flow.c  | 68 -
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst
index 5a4b2a6..05dacb3 100644
--- a/doc/guides/nics/sfc_efx.rst
+++ b/doc/guides/nics/sfc_efx.rst
@@ -168,6 +168,8 @@ Supported pattern items:
 
 - VXLAN (exact match of VXLAN network identifier)
 
+- NVGRE (exact match of virtual subnet ID)
+
 Supported actions:
 
 - VOID
diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index 20ba69d..126ec9b 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -58,6 +58,7 @@ static sfc_flow_item_parse sfc_flow_parse_ipv6;
 static sfc_flow_item_parse sfc_flow_parse_tcp;
 static sfc_flow_item_parse sfc_flow_parse_udp;
 static sfc_flow_item_parse sfc_flow_parse_vxlan;
+static sfc_flow_item_parse sfc_flow_parse_nvgre;
 
 static boolean_t
 sfc_flow_is_zero(const uint8_t *buf, unsigned int size)
@@ -719,10 +720,17 @@ sfc_flow_set_match_flags_for_encap_pkts(const struct 
rte_flow_item *item,
"in VxLAN pattern");
return -rte_errno;
 
+   case EFX_IPPROTO_GRE:
+   rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM, item,
+   "Outer IP header protocol must be GRE "
+   "in NVGRE pattern");
+   return -rte_errno;
+
default:
rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ITEM, item,
-   "Only VxLAN tunneling patterns "
+   "Only VxLAN/NVGRE tunneling patterns "
"are supported");
return -rte_errno;
}
@@ -823,6 +831,57 @@ sfc_flow_parse_vxlan(const struct rte_flow_item *item,
return rc;
 }
 
+/**
+ * Convert NVGRE item to EFX filter specification.
+ *
+ * @param item[in]
+ *   Item specification. Only virtual subnet ID field is supported.
+ *   If the mask is NULL, default mask will be used.
+ *   Ranging is not supported.
+ * @param efx_spec[in, out]
+ *   EFX filter specification to update.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ */
+static int
+sfc_flow_parse_nvgre(const struct rte_flow_item *item,
+efx_filter_spec_t *efx_spec,
+struct rte_flow_error *error)
+{
+   int rc;
+   const struct rte_flow_item_nvgre *spec = NULL;
+   const struct rte_flow_item_nvgre *mask = NULL;
+   const struct rte_flow_item_nvgre supp_mask = {
+   .tni = { 0xff, 0xff, 0xff }
+   };
+
+   rc = sfc_flow_parse_init(item,
+(const void **)&spec,
+(const void **)&mask,
+&supp_mask,
+&rte_flow_item_nvgre_mask,
+sizeof(struct rte_flow_item_nvgre),
+error);
+   if (rc != 0)
+   return rc;
+
+   rc = sfc_flow_set_match_flags_for_encap_pkts(item, efx_spec,
+EFX_IPPROTO_GRE, error);
+   if (rc != 0)
+   return rc;
+
+   efx_spec->efs_encap_type = EFX_TUNNEL_PROTOCOL_NVGRE;
+   efx_spec->efs_match_flags |= EFX_FILTER_MATCH_ENCAP_TYPE;
+
+   if (spec == NULL)
+   return 0;
+
+   rc = sfc_flow_set_efx_spec_vni_or_vsid(efx_spec, spec->tni,
+  mask->tni, item, error);
+
+   return rc;
+}
+
 static const struct sfc_flow_item sfc_flow_items[] = {
{
.type = RTE_FLOW_ITEM_TYPE_VOID,
@@ -872,6 +931,12 @@ static const struct sfc_flow_item sfc_flow_items[] = {
.layer = SFC_FLOW_ITEM_START_LAYER,
.parse = sfc_flow_parse_vxlan,
},
+   {
+   .type = RTE_FLOW_ITEM_TYPE_NVGRE,
+   .prev_layer = SFC_FLOW_ITEM_L3,
+   .layer = SFC_FLOW_ITEM_START_LAYER,
+   .parse = sfc_flow_parse_nvgre,
+   },
 };
 
 /*
@@ -980,6 +1045,7 @@ sfc_flow_parse_pattern(const struct rte_flow_item 
pattern[],
break;
 
case RTE_FLOW_ITEM_TYPE_VXLAN:
+   case RTE_FLOW_ITEM_TYPE_NVGRE:
if (is_ifrm) {
rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ITEM,
-- 
2.7.4



[dpdk-dev] [PATCH 03/14] net/sfc/base: support VXLAN filter creation

2018-02-27 Thread Andrew Rybchenko
From: Vijay Srivastava 

Signed-off-by: Vijay Srivastava 
Signed-off-by: Andrew Rybchenko 
---
 drivers/net/sfc/base/efx.h|  7 +++
 drivers/net/sfc/base/efx_filter.c | 36 
 2 files changed, 43 insertions(+)

diff --git a/drivers/net/sfc/base/efx.h b/drivers/net/sfc/base/efx.h
index ac589f9..c0e4218 100644
--- a/drivers/net/sfc/base/efx.h
+++ b/drivers/net/sfc/base/efx.h
@@ -2462,6 +2462,13 @@ efx_filter_spec_set_encap_type(
__inefx_tunnel_protocol_t encap_type,
__inefx_filter_inner_frame_match_t inner_frame_match);
 
+extern __checkReturn   efx_rc_t
+efx_filter_spec_set_vxlan_full(
+   __inout efx_filter_spec_t *spec,
+   __inconst uint8_t *vxlan_id,
+   __inconst uint8_t *inner_addr,
+   __inconst uint8_t *outer_addr);
+
 #if EFSYS_OPT_RX_SCALE
 extern __checkReturn   efx_rc_t
 efx_filter_spec_set_rss_context(
diff --git a/drivers/net/sfc/base/efx_filter.c 
b/drivers/net/sfc/base/efx_filter.c
index b92541a..4bce050 100644
--- a/drivers/net/sfc/base/efx_filter.c
+++ b/drivers/net/sfc/base/efx_filter.c
@@ -462,6 +462,42 @@ efx_filter_spec_set_encap_type(
return (rc);
 }
 
+/*
+ * Specify inner and outer Ethernet address and VXLAN ID in filter
+ * specification.
+ */
+   __checkReturn   efx_rc_t
+efx_filter_spec_set_vxlan_full(
+   __inout efx_filter_spec_t *spec,
+   __inconst uint8_t *vxlan_id,
+   __inconst uint8_t *inner_addr,
+   __inconst uint8_t *outer_addr)
+{
+   EFSYS_ASSERT3P(spec, !=, NULL);
+   EFSYS_ASSERT3P(vxlan_id, !=, NULL);
+   EFSYS_ASSERT3P(inner_addr, !=, NULL);
+   EFSYS_ASSERT3P(outer_addr, !=, NULL);
+
+   if ((inner_addr == NULL) && (outer_addr == NULL))
+   return (EINVAL);
+
+   if (vxlan_id != NULL) {
+   spec->efs_match_flags |= EFX_FILTER_MATCH_VNI_OR_VSID;
+   memcpy(spec->efs_vni_or_vsid, vxlan_id, EFX_VNI_OR_VSID_LEN);
+   }
+   if (outer_addr != NULL) {
+   spec->efs_match_flags |= EFX_FILTER_MATCH_LOC_MAC;
+   memcpy(spec->efs_loc_mac, outer_addr, EFX_MAC_ADDR_LEN);
+   }
+   if (inner_addr != NULL) {
+   spec->efs_match_flags |= EFX_FILTER_MATCH_IFRM_LOC_MAC;
+   memcpy(spec->efs_ifrm_loc_mac, inner_addr, EFX_MAC_ADDR_LEN);
+   }
+   spec->efs_encap_type = EFX_TUNNEL_PROTOCOL_VXLAN;
+
+   return (0);
+}
+
 #if EFSYS_OPT_RX_SCALE
__checkReturn   efx_rc_t
 efx_filter_spec_set_rss_context(
-- 
2.7.4



[dpdk-dev] [PATCH 14/14] doc: add net/sfc flow API support for tunnels

2018-02-27 Thread Andrew Rybchenko
Signed-off-by: Andrew Rybchenko 
---
 doc/guides/rel_notes/release_18_05.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_05.rst 
b/doc/guides/rel_notes/release_18_05.rst
index 3923dc2..894f636 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -41,6 +41,12 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+* **Updated Solarflare network PMD.**
+
+  Updated the sfc_efx driver including the following changes:
+
+  * Added support for NVGRE, VXLAN and GENEVE filters in flow API.
+
 
 API Changes
 ---
-- 
2.7.4



[dpdk-dev] [PATCH 02/14] net/sfc/base: support VNI/VSID and inner frame local MAC

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

This supports VNI/VSID and inner frame local MAC fields to
match in VXLAN, GENEVE, or NVGRE packets.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/base/ef10_filter.c | 18 ++
 drivers/net/sfc/base/efx.h |  8 
 2 files changed, 26 insertions(+)

diff --git a/drivers/net/sfc/base/ef10_filter.c 
b/drivers/net/sfc/base/ef10_filter.c
index f643cdb..8026b1a 100644
--- a/drivers/net/sfc/base/ef10_filter.c
+++ b/drivers/net/sfc/base/ef10_filter.c
@@ -118,6 +118,10 @@ ef10_filter_init(
MATCH_MASK(MC_CMD_FILTER_OP_EXT_IN_MATCH_OUTER_VLAN));
EFX_STATIC_ASSERT(EFX_FILTER_MATCH_IP_PROTO ==
MATCH_MASK(MC_CMD_FILTER_OP_EXT_IN_MATCH_IP_PROTO));
+   EFX_STATIC_ASSERT(EFX_FILTER_MATCH_VNI_OR_VSID ==
+   MATCH_MASK(MC_CMD_FILTER_OP_EXT_IN_MATCH_VNI_OR_VSID));
+   EFX_STATIC_ASSERT(EFX_FILTER_MATCH_IFRM_LOC_MAC ==
+   MATCH_MASK(MC_CMD_FILTER_OP_EXT_IN_MATCH_IFRM_DST_MAC));
EFX_STATIC_ASSERT(EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST ==
MATCH_MASK(MC_CMD_FILTER_OP_EXT_IN_MATCH_IFRM_UNKNOWN_MCAST_DST));
EFX_STATIC_ASSERT(EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST ==
@@ -290,6 +294,12 @@ efx_mcdi_filter_op_add(
rc = EINVAL;
goto fail2;
}
+
+   memcpy(MCDI_IN2(req, uint8_t, FILTER_OP_EXT_IN_VNI_OR_VSID),
+   spec->efs_vni_or_vsid, EFX_VNI_OR_VSID_LEN);
+
+   memcpy(MCDI_IN2(req, uint8_t, FILTER_OP_EXT_IN_IFRM_DST_MAC),
+   spec->efs_ifrm_loc_mac, EFX_MAC_ADDR_LEN);
}
 
efx_mcdi_execute(enp, &req);
@@ -413,6 +423,12 @@ ef10_filter_equal(
return (B_FALSE);
if (left->efs_encap_type != right->efs_encap_type)
return (B_FALSE);
+   if (memcmp(left->efs_vni_or_vsid, right->efs_vni_or_vsid,
+   EFX_VNI_OR_VSID_LEN))
+   return (B_FALSE);
+   if (memcmp(left->efs_ifrm_loc_mac, right->efs_ifrm_loc_mac,
+   EFX_MAC_ADDR_LEN))
+   return (B_FALSE);
 
return (B_TRUE);
 
@@ -983,6 +999,8 @@ ef10_filter_supported_filters(
EFX_FILTER_MATCH_LOC_MAC | EFX_FILTER_MATCH_LOC_PORT |
EFX_FILTER_MATCH_ETHER_TYPE | EFX_FILTER_MATCH_INNER_VID |
EFX_FILTER_MATCH_OUTER_VID | EFX_FILTER_MATCH_IP_PROTO |
+   EFX_FILTER_MATCH_VNI_OR_VSID |
+   EFX_FILTER_MATCH_IFRM_LOC_MAC |
EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST |
EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST |
EFX_FILTER_MATCH_UNKNOWN_MCAST_DST |
diff --git a/drivers/net/sfc/base/efx.h b/drivers/net/sfc/base/efx.h
index fe996e7..ac589f9 100644
--- a/drivers/net/sfc/base/efx.h
+++ b/drivers/net/sfc/base/efx.h
@@ -413,6 +413,8 @@ typedef enum efx_link_mode_e {
 
 #defineEFX_MAC_ADDR_LEN 6
 
+#defineEFX_VNI_OR_VSID_LEN 3
+
 #defineEFX_MAC_ADDR_IS_MULTICAST(_address) (((uint8_t *)_address)[0] & 
0x01)
 
 #defineEFX_MAC_MULTICAST_LIST_MAX  256
@@ -2313,6 +2315,10 @@ typedef uint8_t efx_filter_flags_t;
 #defineEFX_FILTER_MATCH_OUTER_VID  0x0100
 /* Match by IP transport protocol */
 #defineEFX_FILTER_MATCH_IP_PROTO   0x0200
+/* Match by VNI or VSID */
+#defineEFX_FILTER_MATCH_VNI_OR_VSID0x0800
+/* For encapsulated packets, match by inner frame local MAC address */
+#defineEFX_FILTER_MATCH_IFRM_LOC_MAC   0x0001
 /* For encapsulated packets, match all multicast inner frames */
 #defineEFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST 0x0100
 /* For encapsulated packets, match all unicast inner frames */
@@ -2359,6 +2365,8 @@ typedef struct efx_filter_spec_s {
uint16_tefs_rem_port;
efx_oword_t efs_rem_host;
efx_oword_t efs_loc_host;
+   uint8_t efs_vni_or_vsid[EFX_VNI_OR_VSID_LEN];
+   uint8_t efs_ifrm_loc_mac[EFX_MAC_ADDR_LEN];
 } efx_filter_spec_t;
 
 
-- 
2.7.4



[dpdk-dev] [PATCH 11/14] net/sfc: multiply of specs w/o inner frame destination MAC

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

Knowledge of a network identifier is not sufficient to construct a
workable hardware filter for encapsulated traffic. It's obligatory to
specify one of the match flags associated with inner frame destination
MAC. If the address is unknown, then one needs to specify either unknown
unicast or unknown multicast destination match flag.

In terms of RTE flow API, this would require adding multiple flow rules
with corresponding ETH items besides the tunnel item. In order to avoid
such a complication, the patch implements a mechanism to auto-complete
an underlying filter representation of a flow rule in order to create
additional filter specififcations featuring the missing match flags.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Ivan Malov 
---
 drivers/net/sfc/sfc_flow.c | 114 -
 drivers/net/sfc/sfc_flow.h |   2 +-
 2 files changed, 113 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index 244fcdb..2d45827 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -68,6 +68,10 @@ typedef int (sfc_flow_spec_set_vals)(struct sfc_flow_spec 
*spec,
 unsigned int filters_count_for_one_val,
 struct rte_flow_error *error);
 
+typedef boolean_t (sfc_flow_spec_check)(efx_filter_match_flags_t match,
+   efx_filter_spec_t *spec,
+   struct sfc_filter *filter);
+
 struct sfc_flow_copy_flag {
/* EFX filter specification match flag */
efx_filter_match_flags_t flag;
@@ -75,9 +79,16 @@ struct sfc_flow_copy_flag {
unsigned int vals_count;
/* Function to set values in specifications */
sfc_flow_spec_set_vals *set_vals;
+   /*
+* Function to check that the specification is suitable
+* for adding this match flag
+*/
+   sfc_flow_spec_check *spec_check;
 };
 
 static sfc_flow_spec_set_vals sfc_flow_set_ethertypes;
+static sfc_flow_spec_set_vals sfc_flow_set_ifrm_unknown_dst_flags;
+static sfc_flow_spec_check sfc_flow_check_ifrm_unknown_dst_flags;
 
 static boolean_t
 sfc_flow_is_zero(const uint8_t *buf, unsigned int size)
@@ -1548,12 +1559,98 @@ sfc_flow_set_ethertypes(struct sfc_flow_spec *spec,
return 0;
 }
 
+/**
+ * Set the EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST and
+ * EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST match flags in the same
+ * specifications after copying.
+ *
+ * @param spec[in, out]
+ *   SFC flow specification to update.
+ * @param filters_count_for_one_val[in]
+ *   How many specifications should have the same match flag, what is the
+ *   number of specifications before copying.
+ * @param error[out]
+ *   Perform verbose error reporting if not NULL.
+ */
+static int
+sfc_flow_set_ifrm_unknown_dst_flags(struct sfc_flow_spec *spec,
+   unsigned int filters_count_for_one_val,
+   struct rte_flow_error *error)
+{
+   unsigned int i;
+   static const efx_filter_match_flags_t vals[] = {
+   EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST,
+   EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST
+   };
+
+   if (filters_count_for_one_val * RTE_DIM(vals) != spec->count) {
+   rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+   "Number of specifications is incorrect while copying "
+   "by inner frame unknown destination flags");
+   return -rte_errno;
+   }
+
+   for (i = 0; i < spec->count; i++) {
+   /* The check above ensures that divisor can't be zero here */
+   spec->filters[i].efs_match_flags |=
+   vals[i / filters_count_for_one_val];
+   }
+
+   return 0;
+}
+
+/**
+ * Check that the following conditions are met:
+ * - the specification corresponds to a filter for encapsulated traffic
+ * - the list of supported filters has a filter
+ *   with EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST flag instead of
+ *   EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST, since this filter will also
+ *   be inserted.
+ *
+ * @param match[in]
+ *   The match flags of filter.
+ * @param spec[in]
+ *   Specification to be supplemented.
+ * @param filter[in]
+ *   SFC filter with list of supported filters.
+ */
+static boolean_t
+sfc_flow_check_ifrm_unknown_dst_flags(efx_filter_match_flags_t match,
+ efx_filter_spec_t *spec,
+ struct sfc_filter *filter)
+{
+   unsigned int i;
+   efx_tunnel_protocol_t encap_type = spec->efs_encap_type;
+   efx_filter_match_flags_t match_mcast_dst;
+
+   if (encap_type == EFX_TUNNEL_PROTOCOL_NONE)
+   return B_FALSE;
+
+   match_mcast_dst =
+   (match & ~EFX_FI

[dpdk-dev] [PATCH 07/14] net/sfc: add GENEVE in flow API filters support

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

Exact match of virtual network identifier is supported by parser.
IP protocol match are enforced to UDP.
Only Ethernet protocol type is supported.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Ivan Malov 
Reviewed-by: Andy Moreton 
---
 doc/guides/nics/sfc_efx.rst |  3 ++
 drivers/net/sfc/sfc_flow.c  | 80 +++--
 2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst
index 05dacb3..943fe55 100644
--- a/doc/guides/nics/sfc_efx.rst
+++ b/doc/guides/nics/sfc_efx.rst
@@ -168,6 +168,9 @@ Supported pattern items:
 
 - VXLAN (exact match of VXLAN network identifier)
 
+- GENEVE (exact match of virtual network identifier, only Ethernet (0x6558)
+  protocol type is supported)
+
 - NVGRE (exact match of virtual subnet ID)
 
 Supported actions:
diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index 126ec9b..efdc664 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -58,6 +58,7 @@ static sfc_flow_item_parse sfc_flow_parse_ipv6;
 static sfc_flow_item_parse sfc_flow_parse_tcp;
 static sfc_flow_item_parse sfc_flow_parse_udp;
 static sfc_flow_item_parse sfc_flow_parse_vxlan;
+static sfc_flow_item_parse sfc_flow_parse_geneve;
 static sfc_flow_item_parse sfc_flow_parse_nvgre;
 
 static boolean_t
@@ -717,7 +718,7 @@ sfc_flow_set_match_flags_for_encap_pkts(const struct 
rte_flow_item *item,
rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ITEM, item,
"Outer IP header protocol must be UDP "
-   "in VxLAN pattern");
+   "in VxLAN/GENEVE pattern");
return -rte_errno;
 
case EFX_IPPROTO_GRE:
@@ -730,7 +731,7 @@ sfc_flow_set_match_flags_for_encap_pkts(const struct 
rte_flow_item *item,
default:
rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ITEM, item,
-   "Only VxLAN/NVGRE tunneling patterns "
+   "Only VxLAN/GENEVE/NVGRE tunneling patterns "
"are supported");
return -rte_errno;
}
@@ -832,6 +833,74 @@ sfc_flow_parse_vxlan(const struct rte_flow_item *item,
 }
 
 /**
+ * Convert GENEVE item to EFX filter specification.
+ *
+ * @param item[in]
+ *   Item specification. Only Virtual Network Identifier and protocol type
+ *   fields are supported. But protocol type can be only Ethernet (0x6558).
+ *   If the mask is NULL, default mask will be used.
+ *   Ranging is not supported.
+ * @param efx_spec[in, out]
+ *   EFX filter specification to update.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ */
+static int
+sfc_flow_parse_geneve(const struct rte_flow_item *item,
+ efx_filter_spec_t *efx_spec,
+ struct rte_flow_error *error)
+{
+   int rc;
+   const struct rte_flow_item_geneve *spec = NULL;
+   const struct rte_flow_item_geneve *mask = NULL;
+   const struct rte_flow_item_geneve supp_mask = {
+   .protocol = RTE_BE16(0x),
+   .vni = { 0xff, 0xff, 0xff }
+   };
+
+   rc = sfc_flow_parse_init(item,
+(const void **)&spec,
+(const void **)&mask,
+&supp_mask,
+&rte_flow_item_geneve_mask,
+sizeof(struct rte_flow_item_geneve),
+error);
+   if (rc != 0)
+   return rc;
+
+   rc = sfc_flow_set_match_flags_for_encap_pkts(item, efx_spec,
+EFX_IPPROTO_UDP, error);
+   if (rc != 0)
+   return rc;
+
+   efx_spec->efs_encap_type = EFX_TUNNEL_PROTOCOL_GENEVE;
+   efx_spec->efs_match_flags |= EFX_FILTER_MATCH_ENCAP_TYPE;
+
+   if (spec == NULL)
+   return 0;
+
+   if (mask->protocol == supp_mask.protocol) {
+   if (spec->protocol != rte_cpu_to_be_16(ETHER_TYPE_TEB)) {
+   rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM, item,
+   "GENEVE encap. protocol must be Ethernet "
+   "(0x6558) in the GENEVE pattern item");
+   return -rte_errno;
+   }
+   } else if (mask->protocol != 0) {
+   rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM, item,
+   "Unsupported mask for GENEVE encap. protocol");
+   return -rte_errno;
+   }
+
+   rc = sfc_flow_set_efx_spec_vni_or_vsid(efx_spec, spec->vni,
+  

[dpdk-dev] [PATCH 01/14] net/sfc/base: support filters for encapsulated packets

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

This adds filters for encapsulated packets to the list
returned by ef10_filter_supported_filters().

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/base/ef10_filter.c | 65 --
 1 file changed, 55 insertions(+), 10 deletions(-)

diff --git a/drivers/net/sfc/base/ef10_filter.c 
b/drivers/net/sfc/base/ef10_filter.c
index 27b5998..f643cdb 100644
--- a/drivers/net/sfc/base/ef10_filter.c
+++ b/drivers/net/sfc/base/ef10_filter.c
@@ -890,6 +890,7 @@ efx_mcdi_get_parser_disp_info(
__inefx_nic_t *enp,
__out_ecount(buffer_length) uint32_t *buffer,
__insize_t buffer_length,
+   __inboolean_t encap,
__out   size_t *list_lengthp)
 {
efx_mcdi_req_t req;
@@ -906,7 +907,8 @@ efx_mcdi_get_parser_disp_info(
req.emr_out_buf = payload;
req.emr_out_length = MC_CMD_GET_PARSER_DISP_INFO_OUT_LENMAX;
 
-   MCDI_IN_SET_DWORD(req, GET_PARSER_DISP_INFO_OUT_OP,
+   MCDI_IN_SET_DWORD(req, GET_PARSER_DISP_INFO_OUT_OP, encap ?
+   MC_CMD_GET_PARSER_DISP_INFO_IN_OP_GET_SUPPORTED_ENCAP_RX_MATCHES :
MC_CMD_GET_PARSER_DISP_INFO_IN_OP_GET_SUPPORTED_RX_MATCHES);
 
efx_mcdi_execute(enp, &req);
@@ -966,28 +968,66 @@ ef10_filter_supported_filters(
__insize_t buffer_length,
__out   size_t *list_lengthp)
 {
-
+   efx_nic_cfg_t *encp = &(enp->en_nic_cfg);
size_t mcdi_list_length;
+   size_t mcdi_encap_list_length;
size_t list_length;
uint32_t i;
+   uint32_t next_buf_idx;
+   size_t next_buf_length;
efx_rc_t rc;
+   boolean_t no_space = B_FALSE;
efx_filter_match_flags_t all_filter_flags =
(EFX_FILTER_MATCH_REM_HOST | EFX_FILTER_MATCH_LOC_HOST |
EFX_FILTER_MATCH_REM_MAC | EFX_FILTER_MATCH_REM_PORT |
EFX_FILTER_MATCH_LOC_MAC | EFX_FILTER_MATCH_LOC_PORT |
EFX_FILTER_MATCH_ETHER_TYPE | EFX_FILTER_MATCH_INNER_VID |
EFX_FILTER_MATCH_OUTER_VID | EFX_FILTER_MATCH_IP_PROTO |
+   EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST |
+   EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST |
EFX_FILTER_MATCH_UNKNOWN_MCAST_DST |
EFX_FILTER_MATCH_UNKNOWN_UCAST_DST);
 
-   rc = efx_mcdi_get_parser_disp_info(enp, buffer, buffer_length,
-   &mcdi_list_length);
+   /*
+* Two calls to MC_CMD_GET_PARSER_DISP_INFO are needed: one to get the
+* list of supported filters for ordinary packets, and then another to
+* get the list of supported filters for encapsulated packets.
+*/
+   rc = efx_mcdi_get_parser_disp_info(enp, buffer, buffer_length, B_FALSE,
+   &mcdi_list_length);
if (rc != 0) {
-   if (rc == ENOSPC) {
-   /* Pass through mcdi_list_length for the list length */
-   *list_lengthp = mcdi_list_length;
+   if (rc == ENOSPC)
+   no_space = B_TRUE;
+   else
+   goto fail1;
+   }
+
+   if (no_space) {
+   next_buf_idx = 0;
+   next_buf_length = 0;
+   } else {
+   EFSYS_ASSERT(mcdi_list_length < buffer_length);
+   next_buf_idx = mcdi_list_length;
+   next_buf_length = buffer_length - mcdi_list_length;
+   }
+
+   if (encp->enc_tunnel_encapsulations_supported != 0) {
+   rc = efx_mcdi_get_parser_disp_info(enp, &buffer[next_buf_idx],
+   next_buf_length, B_TRUE, &mcdi_encap_list_length);
+   if (rc != 0) {
+   if (rc == ENOSPC)
+   no_space = B_TRUE;
+   else
+   goto fail2;
}
-   goto fail1;
+   } else {
+   mcdi_encap_list_length = 0;
+   }
+
+   if (no_space) {
+   *list_lengthp = mcdi_list_length + mcdi_encap_list_length;
+   rc = ENOSPC;
+   goto fail3;
}
 
/*
@@ -1000,9 +1040,10 @@ ef10_filter_supported_filters(
 * of the matches is preserved as they are ordered from highest to
 * lowest priority.
 */
-   EFSYS_ASSERT(mcdi_list_length <= buffer_length);
+   EFSYS_ASSERT(mcdi_list_length + mcdi_encap_list_length <=
+   buffer_length);
list_length = 0;
-   for (i = 0; i < mcdi_list_length; i++) {
+   for (i = 0; i < mcdi_list_length + mcdi_encap_list_length; i++) {
if ((buffer[i] & ~all_filter_flags) == 0) {
buffer[list_length] = buffer[i];
list_length++;
@@ -1013,6 +1054,10 @@ ef10_filter_supported_f

[dpdk-dev] [PATCH 05/14] net/sfc: add VXLAN in flow API filters support

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

Exact match of VXLAN network identifier is supported by parser.
IP protocol match are enforced to UDP.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Ivan Malov 
Reviewed-by: Andy Moreton 
---
 doc/guides/nics/sfc_efx.rst |   2 +
 drivers/net/sfc/sfc_flow.c  | 165 
 2 files changed, 167 insertions(+)

diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst
index ccdf5ff..5a4b2a6 100644
--- a/doc/guides/nics/sfc_efx.rst
+++ b/doc/guides/nics/sfc_efx.rst
@@ -166,6 +166,8 @@ Supported pattern items:
 
 - UDP (exact match of source/destination ports)
 
+- VXLAN (exact match of VXLAN network identifier)
+
 Supported actions:
 
 - VOID
diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index 93cdf8f..20ba69d 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -57,6 +57,7 @@ static sfc_flow_item_parse sfc_flow_parse_ipv4;
 static sfc_flow_item_parse sfc_flow_parse_ipv6;
 static sfc_flow_item_parse sfc_flow_parse_tcp;
 static sfc_flow_item_parse sfc_flow_parse_udp;
+static sfc_flow_item_parse sfc_flow_parse_vxlan;
 
 static boolean_t
 sfc_flow_is_zero(const uint8_t *buf, unsigned int size)
@@ -696,6 +697,132 @@ sfc_flow_parse_udp(const struct rte_flow_item *item,
return -rte_errno;
 }
 
+/*
+ * Filters for encapsulated packets match based on the EtherType and IP
+ * protocol in the outer frame.
+ */
+static int
+sfc_flow_set_match_flags_for_encap_pkts(const struct rte_flow_item *item,
+   efx_filter_spec_t *efx_spec,
+   uint8_t ip_proto,
+   struct rte_flow_error *error)
+{
+   if (!(efx_spec->efs_match_flags & EFX_FILTER_MATCH_IP_PROTO)) {
+   efx_spec->efs_match_flags |= EFX_FILTER_MATCH_IP_PROTO;
+   efx_spec->efs_ip_proto = ip_proto;
+   } else if (efx_spec->efs_ip_proto != ip_proto) {
+   switch (ip_proto) {
+   case EFX_IPPROTO_UDP:
+   rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM, item,
+   "Outer IP header protocol must be UDP "
+   "in VxLAN pattern");
+   return -rte_errno;
+
+   default:
+   rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM, item,
+   "Only VxLAN tunneling patterns "
+   "are supported");
+   return -rte_errno;
+   }
+   }
+
+   if (!(efx_spec->efs_match_flags & EFX_FILTER_MATCH_ETHER_TYPE)) {
+   rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM, item,
+   "Outer frame EtherType in pattern with tunneling "
+   "must be set");
+   return -rte_errno;
+   } else if (efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV4 &&
+  efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV6) {
+   rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM, item,
+   "Outer frame EtherType in pattern with tunneling "
+   "must be IPv4 or IPv6");
+   return -rte_errno;
+   }
+
+   return 0;
+}
+
+static int
+sfc_flow_set_efx_spec_vni_or_vsid(efx_filter_spec_t *efx_spec,
+ const uint8_t *vni_or_vsid_val,
+ const uint8_t *vni_or_vsid_mask,
+ const struct rte_flow_item *item,
+ struct rte_flow_error *error)
+{
+   const uint8_t vni_or_vsid_full_mask[EFX_VNI_OR_VSID_LEN] = {
+   0xff, 0xff, 0xff
+   };
+
+   if (memcmp(vni_or_vsid_mask, vni_or_vsid_full_mask,
+  EFX_VNI_OR_VSID_LEN) == 0) {
+   efx_spec->efs_match_flags |= EFX_FILTER_MATCH_VNI_OR_VSID;
+   rte_memcpy(efx_spec->efs_vni_or_vsid, vni_or_vsid_val,
+  EFX_VNI_OR_VSID_LEN);
+   } else if (!sfc_flow_is_zero(vni_or_vsid_mask, EFX_VNI_OR_VSID_LEN)) {
+   rte_flow_error_set(error, EINVAL,
+  RTE_FLOW_ERROR_TYPE_ITEM, item,
+  "Unsupported VNI/VSID mask");
+   return -rte_errno;
+   }
+
+   return 0;
+}
+
+/**
+ * Convert VXLAN item to EFX filter specification.
+ *
+ * @param item[in]
+ *   Item specification. Only VXLAN network identifier field is supported.
+ *   If the mask is NULL, default mask will be used.
+ *   Ranging is not supported.
+ * @param efx_spec[in, out]
+ *   EFX filter specification to update.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ */
+static int
+sfc_flow_parse_vxla

[dpdk-dev] [PATCH 12/14] net/sfc: multiply of specs with an unknown destination MAC

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

To filter all traffic, need to create two hardware filter specifications
with both unknown unicast and unknown multicast destination MAC address
match flags.

In terms of RTE flow API, this would require adding multiple flow rules
with corresponding ETH items. In order to avoid such a complication, the
patch implements a mechanism to auto-complete an underlying filter
representation of a flow rule in order to create additional filter
specififcations featuring the missing match flags.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Ivan Malov 
---
 drivers/net/sfc/sfc_flow.c | 91 +-
 drivers/net/sfc/sfc_flow.h |  2 +-
 2 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index 2d45827..7b26653 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -86,6 +86,8 @@ struct sfc_flow_copy_flag {
sfc_flow_spec_check *spec_check;
 };
 
+static sfc_flow_spec_set_vals sfc_flow_set_unknown_dst_flags;
+static sfc_flow_spec_check sfc_flow_check_unknown_dst_flags;
 static sfc_flow_spec_set_vals sfc_flow_set_ethertypes;
 static sfc_flow_spec_set_vals sfc_flow_set_ifrm_unknown_dst_flags;
 static sfc_flow_spec_check sfc_flow_check_ifrm_unknown_dst_flags;
@@ -1514,6 +1516,80 @@ sfc_flow_parse_actions(struct sfc_adapter *sa,
 }
 
 /**
+ * Set the EFX_FILTER_MATCH_UNKNOWN_UCAST_DST
+ * and EFX_FILTER_MATCH_UNKNOWN_MCAST_DST match flags in the same
+ * specifications after copying.
+ *
+ * @param spec[in, out]
+ *   SFC flow specification to update.
+ * @param filters_count_for_one_val[in]
+ *   How many specifications should have the same match flag, what is the
+ *   number of specifications before copying.
+ * @param error[out]
+ *   Perform verbose error reporting if not NULL.
+ */
+static int
+sfc_flow_set_unknown_dst_flags(struct sfc_flow_spec *spec,
+  unsigned int filters_count_for_one_val,
+  struct rte_flow_error *error)
+{
+   unsigned int i;
+   static const efx_filter_match_flags_t vals[] = {
+   EFX_FILTER_MATCH_UNKNOWN_UCAST_DST,
+   EFX_FILTER_MATCH_UNKNOWN_MCAST_DST
+   };
+
+   if (filters_count_for_one_val * RTE_DIM(vals) != spec->count) {
+   rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+   "Number of specifications is incorrect while copying "
+   "by unknown destination flags");
+   return -rte_errno;
+   }
+
+   for (i = 0; i < spec->count; i++) {
+   /* The check above ensures that divisor can't be zero here */
+   spec->filters[i].efs_match_flags |=
+   vals[i / filters_count_for_one_val];
+   }
+
+   return 0;
+}
+
+/**
+ * Check that the following conditions are met:
+ * - the list of supported filters has a filter
+ *   with EFX_FILTER_MATCH_UNKNOWN_MCAST_DST flag instead of
+ *   EFX_FILTER_MATCH_UNKNOWN_UCAST_DST, since this filter will also
+ *   be inserted.
+ *
+ * @param match[in]
+ *   The match flags of filter.
+ * @param spec[in]
+ *   Specification to be supplemented.
+ * @param filter[in]
+ *   SFC filter with list of supported filters.
+ */
+static boolean_t
+sfc_flow_check_unknown_dst_flags(efx_filter_match_flags_t match,
+__rte_unused efx_filter_spec_t *spec,
+struct sfc_filter *filter)
+{
+   unsigned int i;
+   efx_filter_match_flags_t match_mcast_dst;
+
+   match_mcast_dst =
+   (match & ~EFX_FILTER_MATCH_UNKNOWN_UCAST_DST) |
+   EFX_FILTER_MATCH_UNKNOWN_MCAST_DST;
+   for (i = 0; i < filter->supported_match_num; i++) {
+   if (match_mcast_dst == filter->supported_match[i])
+   return B_TRUE;
+   }
+
+   return B_FALSE;
+}
+
+/**
  * Set the EFX_FILTER_MATCH_ETHER_TYPE match flag and EFX_ETHER_TYPE_IPV4 and
  * EFX_ETHER_TYPE_IPV6 values of the corresponding field in the same
  * specifications after copying.
@@ -1638,9 +1714,22 @@ 
sfc_flow_check_ifrm_unknown_dst_flags(efx_filter_match_flags_t match,
return B_FALSE;
 }
 
-/* Match flags that can be automatically added to filters */
+/*
+ * Match flags that can be automatically added to filters.
+ * Selecting the last minimum when searching for the copy flag ensures that the
+ * EFX_FILTER_MATCH_UNKNOWN_UCAST_DST flag has a higher priority than
+ * EFX_FILTER_MATCH_ETHER_TYPE. This is because the filter
+ * EFX_FILTER_MATCH_UNKNOWN_UCAST_DST is at the end of the list of supported
+ * filters.
+ */
 static const struct sfc_flow_copy_flag sfc_flow_copy_flags[] = {
{
+   .flag = EFX_FILTER_MATCH_UNKNOWN_UCAST_DST,
+   .vals_count = 2,
+   .set_vals = sfc_flow_set_unknown_dst_flags,
+ 

[dpdk-dev] [PATCH 10/14] net/sfc: multiply of specs with an unknown EtherType

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

Hardware filter specification for encapsulated traffic must contain
EtherType. In terms of RTE flow API, this would require L3 item to be
used in the flow rule. In the simplest case, if the user needs to filter
encapsulated traffic without knowledge of exact EtherType, they will
have to create multiple variants of the flow rule featuring all possible
L3 items (IPv4, IPv6), respectively. In order to hide the gory details
and avoid such a complication, this patch implements a mechanism to
auto-complete the filter specifications if need be.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Ivan Malov 
---
 drivers/net/sfc/sfc_flow.c | 306 +++--
 drivers/net/sfc/sfc_flow.h |   2 +-
 2 files changed, 266 insertions(+), 42 deletions(-)

diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index a432936..244fcdb 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -64,6 +64,21 @@ static sfc_flow_item_parse sfc_flow_parse_vxlan;
 static sfc_flow_item_parse sfc_flow_parse_geneve;
 static sfc_flow_item_parse sfc_flow_parse_nvgre;
 
+typedef int (sfc_flow_spec_set_vals)(struct sfc_flow_spec *spec,
+unsigned int filters_count_for_one_val,
+struct rte_flow_error *error);
+
+struct sfc_flow_copy_flag {
+   /* EFX filter specification match flag */
+   efx_filter_match_flags_t flag;
+   /* Number of values of corresponding field */
+   unsigned int vals_count;
+   /* Function to set values in specifications */
+   sfc_flow_spec_set_vals *set_vals;
+};
+
+static sfc_flow_spec_set_vals sfc_flow_set_ethertypes;
+
 static boolean_t
 sfc_flow_is_zero(const uint8_t *buf, unsigned int size)
 {
@@ -244,16 +259,9 @@ sfc_flow_parse_eth(const struct rte_flow_item *item,
if (rc != 0)
return rc;
 
-   /*
-* If "spec" is not set, could be any Ethernet, but for the inner frame
-* type of destination MAC must be set
-*/
-   if (spec == NULL) {
-   if (is_ifrm)
-   goto fail_bad_ifrm_dst_mac;
-   else
-   return 0;
-   }
+   /* If "spec" is not set, could be any Ethernet */
+   if (spec == NULL)
+   return 0;
 
if (is_same_ether_addr(&mask->dst, &supp_mask.dst)) {
efx_spec->efs_match_flags |= is_ifrm ?
@@ -273,8 +281,6 @@ sfc_flow_parse_eth(const struct rte_flow_item *item,
EFX_FILTER_MATCH_UNKNOWN_MCAST_DST;
} else if (!is_zero_ether_addr(&mask->dst)) {
goto fail_bad_mask;
-   } else if (is_ifrm) {
-   goto fail_bad_ifrm_dst_mac;
}
 
/*
@@ -308,13 +314,6 @@ sfc_flow_parse_eth(const struct rte_flow_item *item,
   RTE_FLOW_ERROR_TYPE_ITEM, item,
   "Bad mask in the ETH pattern item");
return -rte_errno;
-
-fail_bad_ifrm_dst_mac:
-   rte_flow_error_set(error, EINVAL,
-  RTE_FLOW_ERROR_TYPE_ITEM, item,
-  "Type of destination MAC address in inner frame "
-  "must be set");
-   return -rte_errno;
 }
 
 /**
@@ -782,14 +781,9 @@ sfc_flow_set_match_flags_for_encap_pkts(const struct 
rte_flow_item *item,
}
}
 
-   if (!(efx_spec->efs_match_flags & EFX_FILTER_MATCH_ETHER_TYPE)) {
-   rte_flow_error_set(error, EINVAL,
-   RTE_FLOW_ERROR_TYPE_ITEM, item,
-   "Outer frame EtherType in pattern with tunneling "
-   "must be set");
-   return -rte_errno;
-   } else if (efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV4 &&
-  efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV6) {
+   if (efx_spec->efs_match_flags & EFX_FILTER_MATCH_ETHER_TYPE &&
+   efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV4 &&
+   efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV6) {
rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ITEM, item,
"Outer frame EtherType in pattern with tunneling "
@@ -1508,6 +1502,246 @@ sfc_flow_parse_actions(struct sfc_adapter *sa,
return 0;
 }
 
+/**
+ * Set the EFX_FILTER_MATCH_ETHER_TYPE match flag and EFX_ETHER_TYPE_IPV4 and
+ * EFX_ETHER_TYPE_IPV6 values of the corresponding field in the same
+ * specifications after copying.
+ *
+ * @param spec[in, out]
+ *   SFC flow specification to update.
+ * @param filters_count_for_one_val[in]
+ *   How many specifications should have the same EtherType value, what is the
+ *   number of specifications before copying.
+ * @param error[out]
+ *   Perform verbose error reporting if not NULL.
+ */
+static int
+sfc_flow_set_ethertypes(struct sfc_flow_spec *spec,
+   unsig

[dpdk-dev] [PATCH 08/14] net/sfc: add inner frame ETH in flow API filters support

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

Support destination MAC address match in inner frames.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Ivan Malov 
Reviewed-by: Andy Moreton 
---
 doc/guides/nics/sfc_efx.rst |  4 ++-
 drivers/net/sfc/sfc_flow.c  | 73 +++--
 2 files changed, 61 insertions(+), 16 deletions(-)

diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst
index 943fe55..539ce90 100644
--- a/doc/guides/nics/sfc_efx.rst
+++ b/doc/guides/nics/sfc_efx.rst
@@ -152,7 +152,9 @@ Supported pattern items:
 - VOID
 
 - ETH (exact match of source/destination addresses, individual/group match
-  of destination address, EtherType)
+  of destination address, EtherType in the outer frame and exact match of
+  destination addresses, individual/group match of destination address in
+  the inner frame)
 
 - VLAN (exact match of VID, double-tagging is supported)
 
diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index efdc664..c942a36 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -187,11 +187,11 @@ sfc_flow_parse_void(__rte_unused const struct 
rte_flow_item *item,
  * Convert Ethernet item to EFX filter specification.
  *
  * @param item[in]
- *   Item specification. Only source and destination addresses and
- *   Ethernet type fields are supported. In addition to full and
- *   empty masks of destination address, individual/group mask is
- *   also supported. If the mask is NULL, default mask will be used.
- *   Ranging is not supported.
+ *   Item specification. Outer frame specification may only comprise
+ *   source/destination addresses and Ethertype field.
+ *   Inner frame specification may contain destination address only.
+ *   There is support for individual/group mask as well as for empty and full.
+ *   If the mask is NULL, default mask will be used. Ranging is not supported.
  * @param efx_spec[in, out]
  *   EFX filter specification to update.
  * @param[out] error
@@ -210,40 +210,75 @@ sfc_flow_parse_eth(const struct rte_flow_item *item,
.src.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
.type = 0x,
};
+   const struct rte_flow_item_eth ifrm_supp_mask = {
+   .dst.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+   };
const uint8_t ig_mask[EFX_MAC_ADDR_LEN] = {
0x01, 0x00, 0x00, 0x00, 0x00, 0x00
};
+   const struct rte_flow_item_eth *supp_mask_p;
+   const struct rte_flow_item_eth *def_mask_p;
+   uint8_t *loc_mac = NULL;
+   boolean_t is_ifrm = (efx_spec->efs_encap_type !=
+   EFX_TUNNEL_PROTOCOL_NONE);
+
+   if (is_ifrm) {
+   supp_mask_p = &ifrm_supp_mask;
+   def_mask_p = &ifrm_supp_mask;
+   loc_mac = efx_spec->efs_ifrm_loc_mac;
+   } else {
+   supp_mask_p = &supp_mask;
+   def_mask_p = &rte_flow_item_eth_mask;
+   loc_mac = efx_spec->efs_loc_mac;
+   }
 
rc = sfc_flow_parse_init(item,
 (const void **)&spec,
 (const void **)&mask,
-&supp_mask,
-&rte_flow_item_eth_mask,
+supp_mask_p, def_mask_p,
 sizeof(struct rte_flow_item_eth),
 error);
if (rc != 0)
return rc;
 
-   /* If "spec" is not set, could be any Ethernet */
-   if (spec == NULL)
-   return 0;
+   /*
+* If "spec" is not set, could be any Ethernet, but for the inner frame
+* type of destination MAC must be set
+*/
+   if (spec == NULL) {
+   if (is_ifrm)
+   goto fail_bad_ifrm_dst_mac;
+   else
+   return 0;
+   }
 
if (is_same_ether_addr(&mask->dst, &supp_mask.dst)) {
-   efx_spec->efs_match_flags |= EFX_FILTER_MATCH_LOC_MAC;
-   rte_memcpy(efx_spec->efs_loc_mac, spec->dst.addr_bytes,
+   efx_spec->efs_match_flags |= is_ifrm ?
+   EFX_FILTER_MATCH_IFRM_LOC_MAC :
+   EFX_FILTER_MATCH_LOC_MAC;
+   rte_memcpy(loc_mac, spec->dst.addr_bytes,
   EFX_MAC_ADDR_LEN);
} else if (memcmp(mask->dst.addr_bytes, ig_mask,
  EFX_MAC_ADDR_LEN) == 0) {
if (is_unicast_ether_addr(&spec->dst))
-   efx_spec->efs_match_flags |=
+   efx_spec->efs_match_flags |= is_ifrm ?
+   EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST :
EFX_FILTER_MATCH_UNKNOWN_UCAST_DST;
else
-   efx_spec->efs_match_flags |=
+   efx_spec->efs_match_flags |= is_ifrm ?
+  

[dpdk-dev] [PATCH 13/14] net/sfc: avoid creation of ineffective flow rules

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

Despite being versatile, the hardware support for filtering has a number
of special properties which must be taken into account. Namely, there is
a known set of valid filters which don't take any effect despite being
accepted by the hardware.

The combinations of match flags and field values which can describe the
exceptional filters are as follows:
- ETHER_TYPE or ETHER_TYPE | LOC_MAC with IPv4 or IPv6 EtherType
- ETHER_TYPE | IP_PROTO or ETHER_TYPE | IP_PROTO | LOC_MAC with UDP or
TCP IP protocol value
- The same combinations with OUTER_VID and/or INNER_VID

These exceptional filters can be expressed in terms of RTE flow rules.
If the user creates such a flow rule, no traffic will hit the underlying
filter, and no errors will be reported.

This patch adds a means to prevent such ineffective flow rules from
being created.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Ivan Malov 
---
 doc/guides/nics/sfc_efx.rst | 17 ++
 drivers/net/sfc/sfc_flow.c  | 78 +
 2 files changed, 95 insertions(+)

diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst
index 539ce90..f41ccdb 100644
--- a/doc/guides/nics/sfc_efx.rst
+++ b/doc/guides/nics/sfc_efx.rst
@@ -193,6 +193,23 @@ in the mask of destination address. If destinaton address 
in the spec is
 multicast, it matches all multicast (and broadcast) packets, oherwise it
 matches unicast packets that are not filtered by other flow rules.
 
+Exceptions to flow rules
+
+
+There is a list of exceptional flow rule patterns which will not be
+accepted by the PMD. A pattern will be rejected if at least one of the
+conditions is met:
+
+- Filtering by IPv4 or IPv6 EtherType without pattern items of internet
+  layer and above.
+
+- The last item is IPV4 or IPV6, and it's empty.
+
+- Filtering by TCP or UDP IP transport protocol without pattern items of
+  transport layer and above.
+
+- The last item is TCP or UDP, and it's empty.
+
 
 Supported NICs
 --
diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index 7b26653..2b8bef8 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -1919,6 +1919,77 @@ sfc_flow_spec_filters_complete(struct sfc_adapter *sa,
return 0;
 }
 
+/**
+ * Check that set of match flags is referred to by a filter. Filter is
+ * described by match flags with the ability to add OUTER_VID and INNER_VID
+ * flags.
+ *
+ * @param match_flags[in]
+ *   Set of match flags.
+ * @param flags_pattern[in]
+ *   Pattern of filter match flags.
+ */
+static boolean_t
+sfc_flow_is_match_with_vids(efx_filter_match_flags_t match_flags,
+   efx_filter_match_flags_t flags_pattern)
+{
+   if ((match_flags & flags_pattern) != flags_pattern)
+   return B_FALSE;
+
+   switch (match_flags & ~flags_pattern) {
+   case 0:
+   case EFX_FILTER_MATCH_OUTER_VID:
+   case EFX_FILTER_MATCH_OUTER_VID | EFX_FILTER_MATCH_INNER_VID:
+   return B_TRUE;
+   default:
+   return B_FALSE;
+   }
+}
+
+/**
+ * Check whether the spec maps to a hardware filter which is known to be
+ * ineffective despite being valid.
+ *
+ * @param spec[in]
+ *   SFC flow specification.
+ */
+static boolean_t
+sfc_flow_is_match_flags_exception(struct sfc_flow_spec *spec)
+{
+   unsigned int i;
+   uint16_t ether_type;
+   uint8_t ip_proto;
+   efx_filter_match_flags_t match_flags;
+
+   for (i = 0; i < spec->count; i++) {
+   match_flags = spec->filters[i].efs_match_flags;
+
+   if (sfc_flow_is_match_with_vids(match_flags,
+   EFX_FILTER_MATCH_ETHER_TYPE) ||
+   sfc_flow_is_match_with_vids(match_flags,
+   EFX_FILTER_MATCH_ETHER_TYPE |
+   EFX_FILTER_MATCH_LOC_MAC)) {
+   ether_type = spec->filters[i].efs_ether_type;
+   if (ether_type == EFX_ETHER_TYPE_IPV4 ||
+   ether_type == EFX_ETHER_TYPE_IPV6)
+   return B_TRUE;
+   } else if (sfc_flow_is_match_with_vids(match_flags,
+   EFX_FILTER_MATCH_ETHER_TYPE |
+   EFX_FILTER_MATCH_IP_PROTO) ||
+  sfc_flow_is_match_with_vids(match_flags,
+   EFX_FILTER_MATCH_ETHER_TYPE |
+   EFX_FILTER_MATCH_IP_PROTO |
+   EFX_FILTER_MATCH_LOC_MAC)) {
+   ip_proto = spec->filters[i].efs_ip_proto;
+   if (ip_proto == EFX_IPPROTO_TCP ||
+   ip_proto == EFX_IPPROTO_UDP)
+   return B_TRUE;
+   }
+   }
+
+   return B_FALSE;
+}
+
 static int
 sfc_flo

[dpdk-dev] [PATCH 09/14] net/sfc: add infrastructure to make many filters from flow

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

Not all flow rules can be expressed in one hardware filter, so some flow
rules have to be expressed in terms of multiple hardware filters. This
patch provides a means to produce a filter spec template from the flow
rule which then can be used to produce a set of fully elaborated specs
to be inserted.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Ivan Malov 
---
 drivers/net/sfc/sfc_flow.c | 118 -
 drivers/net/sfc/sfc_flow.h |  19 +++-
 2 files changed, 114 insertions(+), 23 deletions(-)

diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index c942a36..a432936 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -25,10 +25,13 @@
 
 /*
  * At now flow API is implemented in such a manner that each
- * flow rule is converted to a hardware filter.
+ * flow rule is converted to one or more hardware filters.
  * All elements of flow rule (attributes, pattern items, actions)
  * correspond to one or more fields in the efx_filter_spec_s structure
  * that is responsible for the hardware filter.
+ * If some required field is unset in the flow rule, then a handful
+ * of filter copies will be created to cover all possible values
+ * of such a field.
  */
 
 enum sfc_flow_item_layers {
@@ -1095,8 +1098,8 @@ sfc_flow_parse_attr(const struct rte_flow_attr *attr,
return -rte_errno;
}
 
-   flow->spec.efs_flags |= EFX_FILTER_FLAG_RX;
-   flow->spec.efs_rss_context = EFX_RSS_CONTEXT_DEFAULT;
+   flow->spec.template.efs_flags |= EFX_FILTER_FLAG_RX;
+   flow->spec.template.efs_rss_context = EFX_RSS_CONTEXT_DEFAULT;
 
return 0;
 }
@@ -1187,7 +1190,7 @@ sfc_flow_parse_pattern(const struct rte_flow_item 
pattern[],
break;
}
 
-   rc = item->parse(pattern, &flow->spec, error);
+   rc = item->parse(pattern, &flow->spec.template, error);
if (rc != 0)
return rc;
 
@@ -1209,7 +1212,7 @@ sfc_flow_parse_queue(struct sfc_adapter *sa,
return -EINVAL;
 
rxq = sa->rxq_info[queue->index].rxq;
-   flow->spec.efs_dmaq_id = (uint16_t)rxq->hw_index;
+   flow->spec.template.efs_dmaq_id = (uint16_t)rxq->hw_index;
 
return 0;
 }
@@ -1285,13 +1288,57 @@ sfc_flow_parse_rss(struct sfc_adapter *sa,
 #endif /* EFSYS_OPT_RX_SCALE */
 
 static int
+sfc_flow_spec_flush(struct sfc_adapter *sa, struct sfc_flow_spec *spec,
+   unsigned int filters_count)
+{
+   unsigned int i;
+   int ret = 0;
+
+   for (i = 0; i < filters_count; i++) {
+   int rc;
+
+   rc = efx_filter_remove(sa->nic, &spec->filters[i]);
+   if (ret == 0 && rc != 0) {
+   sfc_err(sa, "failed to remove filter specification "
+   "(rc = %d)", rc);
+   ret = rc;
+   }
+   }
+
+   return ret;
+}
+
+static int
+sfc_flow_spec_insert(struct sfc_adapter *sa, struct sfc_flow_spec *spec)
+{
+   unsigned int i;
+   int rc = 0;
+
+   for (i = 0; i < spec->count; i++) {
+   rc = efx_filter_insert(sa->nic, &spec->filters[i]);
+   if (rc != 0) {
+   sfc_flow_spec_flush(sa, spec, i);
+   break;
+   }
+   }
+
+   return rc;
+}
+
+static int
+sfc_flow_spec_remove(struct sfc_adapter *sa, struct sfc_flow_spec *spec)
+{
+   return sfc_flow_spec_flush(sa, spec, spec->count);
+}
+
+static int
 sfc_flow_filter_insert(struct sfc_adapter *sa,
   struct rte_flow *flow)
 {
-   efx_filter_spec_t *spec = &flow->spec;
-
 #if EFSYS_OPT_RX_SCALE
struct sfc_flow_rss *rss = &flow->rss_conf;
+   uint32_t efs_rss_context = EFX_RSS_CONTEXT_DEFAULT;
+   unsigned int i;
int rc = 0;
 
if (flow->rss) {
@@ -1302,27 +1349,38 @@ sfc_flow_filter_insert(struct sfc_adapter *sa,
rc = efx_rx_scale_context_alloc(sa->nic,
EFX_RX_SCALE_EXCLUSIVE,
rss_spread,
-   &spec->efs_rss_context);
+   &efs_rss_context);
if (rc != 0)
goto fail_scale_context_alloc;
 
-   rc = efx_rx_scale_mode_set(sa->nic, spec->efs_rss_context,
+   rc = efx_rx_scale_mode_set(sa->nic, efs_rss_context,
   EFX_RX_HASHALG_TOEPLITZ,
   rss->rss_hash_types, B_TRUE);
if (rc != 0)
goto fail_scale_mode_set;
 
-   rc = efx_rx_scale_key_set(sa->nic, spec->efs_rss_context,
+   rc = efx_rx_scale_key_set(sa->nic, efs_rss_context,
 

Re: [dpdk-dev] [PATCH v2] net/tap: add tun support

2018-02-27 Thread Pascal Mazon
> Thanks, my first idea was use the
same.   

  

> Later argued myself in using 'tap_type' since the check for assigning
MAC 
  

> address goes well. Hence I  hope not making the change 'tuntap_type'
is ok?

Well yeah it's still readable, and makes that check simpler.
As we don't use that variable much, I'm ok with it.

Acked-by: Pascal Mazon 

On 26/02/2018 07:15, Vipin Varghese wrote:
> The change adds TUN PMD logic to the existing TAP PMD. TUN PMD can
> be initialized with 'net_tunX' where 'X' represents unique id. PMD
> supports argument interface, while MAC address and remote are not
> supported.
>
> Signed-off-by: Vipin Varghese 
> ---
>
> Changes in V2:
>  - updated the documentation word error - Pascal
> ---
>  doc/guides/nics/tap.rst   |  15 -
>  drivers/net/tap/rte_eth_tap.c | 132 
> +-
>  2 files changed, 118 insertions(+), 29 deletions(-)
>
> diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
> index ea61be3..0fa 100644
> --- a/doc/guides/nics/tap.rst
> +++ b/doc/guides/nics/tap.rst
> @@ -1,8 +1,8 @@
>  ..  SPDX-License-Identifier: BSD-3-Clause
>  Copyright(c) 2016 Intel Corporation.
>  
> -Tap Poll Mode Driver
> -
> +Tun|Tap Poll Mode Driver
> +
>  
>  The ``rte_eth_tap.c`` PMD creates a device using TAP interfaces on the
>  local host. The PMD allows for DPDK and the host to communicate using a raw
> @@ -77,6 +77,17 @@ can utilize that stack to handle the network protocols. 
> Plus you would be able
>  to address the interface using an IP address assigned to the internal
>  interface.
>  
> +The TUN PMD allows user to create a TUN device on host. The PMD allows user
> +to transmit and receive packets via DPDK API calls with L3 header and 
> payload.
> +The devices in host can be accessed via ``ifconfig`` or ``ip`` command. TUN
> +interfaces are passed to DPDK ``rte_eal_init`` arguments as 
> ``--vdev=net_tunX``,
> +where X stands for unique id, example::
> +
> +   --vdev=net_tun0 --vdev=net_tun1,iface=foo1, ...
> +
> +Unlike TAP PMD, TUN PMD does not support user arguments as ``MAC`` or 
> ``remote`` user
> +options. Default interface name is ``dtunX``, where X stands for unique id.
> +
>  Flow API support
>  
>  
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index f09db0e..42c9db4 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -42,6 +42,7 @@
>  /* Linux based path to the TUN device */
>  #define TUN_TAP_DEV_PATH"/dev/net/tun"
>  #define DEFAULT_TAP_NAME"dtap"
> +#define DEFAULT_TUN_NAME"dtun"
>  
>  #define ETH_TAP_IFACE_ARG   "iface"
>  #define ETH_TAP_REMOTE_ARG  "remote"
> @@ -49,6 +50,7 @@
>  #define ETH_TAP_MAC_FIXED   "fixed"
>  
>  static struct rte_vdev_driver pmd_tap_drv;
> +static struct rte_vdev_driver pmd_tun_drv;
>  
>  static const char *valid_arguments[] = {
>   ETH_TAP_IFACE_ARG,
> @@ -58,6 +60,10 @@
>  };
>  
>  static int tap_unit;
> +static int tun_unit;
> +
> +static int tap_type;
> +static char tuntap_name[8];
>  
>  static volatile uint32_t tap_trigger;/* Rx trigger */
>  
> @@ -104,24 +110,26 @@ enum ioctl_mode {
>* Do not set IFF_NO_PI as packet information header will be needed
>* to check if a received packet has been truncated.
>*/
> - ifr.ifr_flags = IFF_TAP;
> + ifr.ifr_flags = (tap_type) ? IFF_TAP : IFF_TUN;
>   snprintf(ifr.ifr_name, IFNAMSIZ, "%s", pmd->name);
>  
>   RTE_LOG(DEBUG, PMD, "ifr_name '%s'\n", ifr.ifr_name);
>  
>   fd = open(TUN_TAP_DEV_PATH, O_RDWR);
>   if (fd < 0) {
> - RTE_LOG(ERR, PMD, "Unable to create TAP interface\n");
> + RTE_LOG(ERR, PMD, "Unable to create %s interface\n",
> + tuntap_name);
>   goto error;
>   }
>  
>  #ifdef IFF_MULTI_QUEUE
>   /* Grab the TUN features to verify we can work multi-queue */
>   if (ioctl(fd, TUNGETFEATURES, &features) < 0) {
> - RTE_LOG(ERR, PMD, "TAP unable to get TUN/TAP features\n");
> + RTE_LOG(ERR, PMD, "%s unable to get TUN/TAP features\n",
> + tuntap_name);
>   goto error;
>   }
> - RTE_LOG(DEBUG, PMD, "  TAP Features %08x\n", features);
> + RTE_LOG(DEBUG, PMD, " %s Features %08x\n", tuntap_name, features);
>  
>   if (features & IFF_MULTI_QUEUE) {
>   RTE_LOG(DEBUG, PMD, "  Multi-queue support for %d queues\n",
> @@ -1108,7 +1116,7 @@ enum ioctl_mode {
>   tmp = &(*tmp)->next;
>   }
>  
> - RTE_LOG(DEBUG, PMD, "  RX TAP device name %s, qid %d on fd %d

Re: [dpdk-dev] [PATCH 01/18] ethdev: support tunnel RSS level

2018-02-27 Thread Ferruh Yigit
On 2/26/2018 3:09 PM, Xueming Li wrote:
> Currently PMD implementations default RSS on either tunnel outer or
> inner fields. This patch introduced RSS level to allow user to specify
> RSS hash field level of tunneled packets.
> 
> 0: outer RSS.
> 1: inner RSS.
> 2-255: deep RSS level.
> 
> Please note that tunnels that tightly nested without IP/UDP/TCP layer
> interlaced are deemed as one level. For example the following packet can
> only use level 0 or 1:
>   eth / ipv4 / GRE / MPLS / ipv4 / udp
> 
> Signed-off-by: Xueming Li 
> ---
>  lib/librte_ether/rte_ethdev.h | 9 +

Please remove the related deprecation notice in this patch.


Re: [dpdk-dev] [RFC 2/3] vhost: add SET_VIRTIO_STATUS support

2018-02-27 Thread Jens Freimann

On Thu, Feb 22, 2018 at 07:19:09PM +0100, Maxime Coquelin wrote:

This patch implements support for the new SET_VIRTIO_STATUS
vhost-user request.

The main use for this new request is for the backend to know
when the driver sets the DRIVER_OK status bit. Starting Virtio
1.0, we know that once the the bit is set, no more queues will
be initialized.
When it happens, this patch removes all queues starting from
the first uninitialized one, so that the port starts even if
the guest driver does not use all the queues provided by QEMU.
This is for example the case with Windows driver, which only
initializes as much queue pairs as vCPUs.

The second use for this request is when the status changes to
reset or failed state, the vhost port is stopped and virtqueues
cleaned and freed.

Signed-off-by: Maxime Coquelin 
---
lib/librte_vhost/vhost_user.c | 98 +++
lib/librte_vhost/vhost_user.h |  5 ++-
2 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index c256ebb06..7ab02c44b 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -67,6 +67,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
[VHOST_USER_SET_SLAVE_REQ_FD]  = "VHOST_USER_SET_SLAVE_REQ_FD",
[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
+   [VHOST_USER_SET_VIRTIO_STATUS]  = "VHOST_USER_SET_VIRTIO_STATUS",
};

static uint64_t
@@ -1244,6 +1245,100 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct 
VhostUserMsg *msg)
return 0;
}

+static int
+vhost_user_set_virtio_status(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+   uint8_t old_status, new_status;
+   uint32_t i;
+
+   /* As per Virtio spec, the Virtio device status is 8 bits wide */
+   if (msg->payload.u64 != (uint8_t)msg->payload.u64) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Invalid Virtio dev status value (%lx)\n",
+   msg->payload.u64);
+   return -1;
+   }
+
+   new_status = msg->payload.u64;
+   old_status = dev->virtio_status;
+
+   if (new_status == old_status)
+   return 0;
+
+   RTE_LOG(DEBUG, VHOST_CONFIG,
+   "New Virtio device status %x (was %x)\n",
+   new_status, old_status);
+
+   dev->virtio_status = new_status;
+
+   if (new_status == 0 || new_status & VIRTIO_CONFIG_S_FAILED) {
+   /*
+* The device moved to reset  or failed state,
+* stop processing the virtqueues
+*/
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
+   dev->notify_ops->destroy_device(dev->vid);
+   }
+
+   while (dev->nr_vring > 0) {
+   struct vhost_virtqueue *vq;
+
+   vq = dev->virtqueue[--dev->nr_vring];
+   if (!vq)
+   continue;
+
+   dev->virtqueue[dev->nr_vring] = NULL;
+   cleanup_vq(dev, vq, 1);
+   free_vq(vq);
+   }
+
+   return 0;
+   }
+
+   if ((dev->features & (1ULL << VIRTIO_F_VERSION_1)) &&
+   (new_status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+   !virtio_is_ready(dev)) {
+   /*
+* Since Virtio 1.0, we know that no more queues will be
+* setup after guest sets DRIVER_OK. So let's remove
+* uinitialized queues.
+*/
+   RTE_LOG(INFO, VHOST_CONFIG,
+   "Driver is ready, but some queues aren't 
initialized\n");
+
+   /*
+* Find the first uninitialized queue.
+*
+* Note: Ideally the backend implementation should
+* support sparsed virtqueues, but as long as it is
+* not the case, let's remove all queues after the
+* first uninitialized one.
+*/
+   for (i = 0; i < dev->nr_vring; i++) {
+   if (!vq_is_ready(dev->virtqueue[i]))
+   break;
+   }
+
+   while (dev->nr_vring >= i) {
+   struct vhost_virtqueue *vq;
+
+   vq = dev->virtqueue[--dev->nr_vring];


If i is 0, we could access an array element out of bounds, no?

With this fixed,

Reviewed-by: Jens Freimann  


regards,
Jens 



Re: [dpdk-dev] [PATCH 02/18] app/testpmd: support flow RSS level parsing

2018-02-27 Thread Ferruh Yigit
On 2/26/2018 3:09 PM, Xueming Li wrote:
> Support new flow RSS level parameter to select inner or outer RSS
> fields. Example:
> 
>   flow create 0 ingress pattern eth  / ipv4 / udp dst is 4789 / vxlan /
> end actions rss queues 1 2 end level 1 / end
> 
> Signed-off-by: Xueming Li 
> ---
>  app/test-pmd/cmdline_flow.c | 27 +--

Isn't there any document file to update for this new parameter?



[dpdk-dev] [PATCH 3/3] doc: add flow API drop action support to net/sfc

2018-02-27 Thread Andrew Rybchenko
Signed-off-by: Andrew Rybchenko 
---
 doc/guides/rel_notes/release_18_05.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/release_18_05.rst 
b/doc/guides/rel_notes/release_18_05.rst
index 894f636..d162daf 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -46,6 +46,7 @@ New Features
   Updated the sfc_efx driver including the following changes:
 
   * Added support for NVGRE, VXLAN and GENEVE filters in flow API.
+  * Added support for DROP action in flow API.
 
 
 API Changes
-- 
2.7.4



[dpdk-dev] [PATCH 00/3] net/sfc: support drop action in flow API

2018-02-27 Thread Andrew Rybchenko
Update base driver and the PMD itself to support drop action in flow API.

It should be applied on top of [1].

[1] http://dpdk.org/ml/archives/dev/2018-February/091530.html

Andrew Rybchenko (1):
  doc: add flow API drop action support to net/sfc

Roman Zhukov (2):
  net/sfc/base: support drop filters on EF10 family NICs
  net/sfc: support DROP action in flow API

 doc/guides/nics/sfc_efx.rst|  2 ++
 doc/guides/rel_notes/release_18_05.rst |  1 +
 drivers/net/sfc/base/ef10_filter.c | 13 +
 drivers/net/sfc/sfc_flow.c |  7 +++
 4 files changed, 19 insertions(+), 4 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH 1/3] net/sfc/base: support drop filters on EF10 family NICs

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

Add support for filters which drop packets when forming MCDI request
for a filter.

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
---
 drivers/net/sfc/base/ef10_filter.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/sfc/base/ef10_filter.c 
b/drivers/net/sfc/base/ef10_filter.c
index 54e1c35..54ea9e3 100644
--- a/drivers/net/sfc/base/ef10_filter.c
+++ b/drivers/net/sfc/base/ef10_filter.c
@@ -209,10 +209,15 @@ efx_mcdi_filter_op_add(
EVB_PORT_ID_ASSIGNED);
MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_MATCH_FIELDS,
match_flags);
-   MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_RX_DEST,
-   MC_CMD_FILTER_OP_EXT_IN_RX_DEST_HOST);
-   MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_RX_QUEUE,
-   spec->efs_dmaq_id);
+   if (spec->efs_dmaq_id == EFX_FILTER_SPEC_RX_DMAQ_ID_DROP) {
+   MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_RX_DEST,
+   MC_CMD_FILTER_OP_EXT_IN_RX_DEST_DROP);
+   } else {
+   MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_RX_DEST,
+   MC_CMD_FILTER_OP_EXT_IN_RX_DEST_HOST);
+   MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_RX_QUEUE,
+   spec->efs_dmaq_id);
+   }
 
 #if EFSYS_OPT_RX_SCALE
if (spec->efs_flags & EFX_FILTER_FLAG_RX_RSS) {
-- 
2.7.4



[dpdk-dev] [PATCH 2/3] net/sfc: support DROP action in flow API

2018-02-27 Thread Andrew Rybchenko
From: Roman Zhukov 

Signed-off-by: Roman Zhukov 
Signed-off-by: Andrew Rybchenko 
---
 doc/guides/nics/sfc_efx.rst | 2 ++
 drivers/net/sfc/sfc_flow.c  | 7 +++
 2 files changed, 9 insertions(+)

diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst
index f41ccdb..36e98d3 100644
--- a/doc/guides/nics/sfc_efx.rst
+++ b/doc/guides/nics/sfc_efx.rst
@@ -183,6 +183,8 @@ Supported actions:
 
 - RSS
 
+- DROP
+
 Validating flow rules depends on the firmware variant.
 
 Ethernet destinaton individual/group match
diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index 2b8bef8..4fe20a2 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -1497,6 +1497,13 @@ sfc_flow_parse_actions(struct sfc_adapter *sa,
break;
 #endif /* EFSYS_OPT_RX_SCALE */
 
+   case RTE_FLOW_ACTION_TYPE_DROP:
+   flow->spec.template.efs_dmaq_id =
+   EFX_FILTER_SPEC_RX_DMAQ_ID_DROP;
+
+   is_specified = B_TRUE;
+   break;
+
default:
rte_flow_error_set(error, ENOTSUP,
   RTE_FLOW_ERROR_TYPE_ACTION, actions,
-- 
2.7.4



[dpdk-dev] [PATCH v2 2/5] eal: don't process IPC messages before init finished

2018-02-27 Thread Anatoly Burakov
It is not possible for a primary process to receive any messages
while initializing, because RTE_MAGIC value is not set in the
shared config, and hence no secondary process can ever spin up
during that time.

However, it is possible for a secondary process to receive messages
from the primary during initialization. We can't just drop the
messages as they may be important, and also we might need to process
replies to our own requests (e.g. VFIO) during initialization.

Therefore, add a tailq for incoming messages, and queue them up
until initialization is complete, and process them in order they
arrived.

Signed-off-by: Anatoly Burakov 
---

Notes:
v2: no changes

 lib/librte_eal/common/eal_common_proc.c | 50 +
 1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index 3a1088e..b4d00c3 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "eal_private.h"
 #include "eal_filesystem.h"
@@ -58,6 +59,18 @@ struct mp_msg_internal {
struct rte_mp_msg msg;
 };
 
+struct message_queue_entry {
+   TAILQ_ENTRY(message_queue_entry) next;
+   struct mp_msg_internal msg;
+   struct sockaddr_un sa;
+};
+
+/** Double linked list of received messages. */
+TAILQ_HEAD(message_queue, message_queue_entry);
+
+static struct message_queue message_queue =
+   TAILQ_HEAD_INITIALIZER(message_queue);
+
 struct sync_request {
TAILQ_ENTRY(sync_request) next;
int reply_received;
@@ -276,12 +289,39 @@ process_msg(struct mp_msg_internal *m, struct sockaddr_un 
*s)
 static void *
 mp_handle(void *arg __rte_unused)
 {
-   struct mp_msg_internal msg;
-   struct sockaddr_un sa;
-
+   struct message_queue_entry *cur_msg, *next_msg, *new_msg = NULL;
while (1) {
-   if (read_msg(&msg, &sa) == 0)
-   process_msg(&msg, &sa);
+   /* we want to process all messages in order of their arrival,
+* but status of init_complete may change while we're iterating
+* the tailq. so, store it here and check once every iteration.
+*/
+   int init_complete = internal_config.init_complete;
+
+   if (new_msg == NULL)
+   new_msg = malloc(sizeof(*new_msg));
+   if (read_msg(&new_msg->msg, &new_msg->sa) == 0) {
+   /* we successfully read the message, so enqueue it */
+   TAILQ_INSERT_TAIL(&message_queue, new_msg, next);
+   new_msg = NULL;
+   } /* reuse new_msg for next message if we couldn't read_msg */
+
+   /* tailq only accessed here, so no locking needed */
+   TAILQ_FOREACH_SAFE(cur_msg, &message_queue, next, next_msg) {
+   /* secondary process should not process any incoming
+* requests until its initialization is complete, but
+* it is allowed to process replies to its own queries.
+*/
+   if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+   !init_complete &&
+   cur_msg->msg.type != MP_REP)
+   continue;
+
+   TAILQ_REMOVE(&message_queue, cur_msg, next);
+
+   process_msg(&cur_msg->msg, &cur_msg->sa);
+
+   free(cur_msg);
+   }
}
 
return NULL;
-- 
2.7.4


[dpdk-dev] [PATCH v2 4/5] eal: prevent secondary process init while sending messages

2018-02-27 Thread Anatoly Burakov
Currently, it is possible to spin up a secondary process while
either sendmsg or request is in progress. Fix this by adding
directory locks during init, sendmsg and requests.

Signed-off-by: Anatoly Burakov 
---

Notes:
v2: added this patch

 lib/librte_eal/common/eal_common_proc.c | 47 -
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index 17fded7..82ea4a7 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -505,6 +505,7 @@ rte_mp_channel_init(void)
 {
char thread_name[RTE_MAX_THREAD_NAME_LEN];
char *path;
+   int dir_fd;
pthread_t tid;
 
snprintf(mp_filter, PATH_MAX, ".%s_unix_*",
@@ -514,14 +515,32 @@ rte_mp_channel_init(void)
snprintf(mp_dir_path, PATH_MAX, "%s", dirname(path));
free(path);
 
+   /* lock the directory */
+   dir_fd = open(mp_dir_path, O_RDONLY);
+   if (dir_fd < 0) {
+   RTE_LOG(ERR, EAL, "failed to open %s: %s\n",
+   mp_dir_path, strerror(errno));
+   return -1;
+   }
+
+   if (flock(dir_fd, LOCK_EX)) {
+   RTE_LOG(ERR, EAL, "failed to lock %s: %s\n",
+   mp_dir_path, strerror(errno));
+   close(dir_fd);
+   return -1;
+   }
+
if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
unlink_sockets(mp_filter)) {
RTE_LOG(ERR, EAL, "failed to unlink mp sockets\n");
+   close(dir_fd);
return -1;
}
 
-   if (open_socket_fd() < 0)
+   if (open_socket_fd() < 0) {
+   close(dir_fd);
return -1;
+   }
 
if (pthread_create(&tid, NULL, mp_handle, NULL) < 0) {
RTE_LOG(ERR, EAL, "failed to create mp thead: %s\n",
@@ -534,6 +553,11 @@ rte_mp_channel_init(void)
/* try best to set thread name */
snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, "rte_mp_handle");
rte_thread_setname(tid, thread_name);
+
+   /* unlock the directory */
+   flock(dir_fd, LOCK_UN);
+   close(dir_fd);
+
return 0;
 }
 
@@ -648,6 +672,14 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type)
return -1;
}
dir_fd = dirfd(mp_dir);
+   /* lock the directory to prevent processes spinning up while we send */
+   if (flock(dir_fd, LOCK_EX)) {
+   RTE_LOG(ERR, EAL, "Unable to lock directory %s\n",
+   mp_dir_path);
+   rte_errno = errno;
+   closedir(mp_dir);
+   return -1;
+   }
while ((ent = readdir(mp_dir))) {
char path[PATH_MAX];
const char *peer_name;
@@ -671,6 +703,8 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type)
else if (active > 0 && send_msg(path, msg, type) < 0)
ret = -1;
}
+   /* unlock the dir */
+   flock(dir_fd, LOCK_UN);
 
closedir(mp_dir);
return ret;
@@ -830,6 +864,15 @@ rte_mp_request(struct rte_mp_msg *req, struct rte_mp_reply 
*reply,
}
dir_fd = dirfd(mp_dir);
 
+   /* lock the directory to prevent processes spinning up while we send */
+   if (flock(dir_fd, LOCK_EX)) {
+   RTE_LOG(ERR, EAL, "Unable to lock directory %s\n",
+   mp_dir_path);
+   closedir(mp_dir);
+   rte_errno = errno;
+   return -1;
+   }
+
while ((ent = readdir(mp_dir))) {
const char *peer_name;
char path[PATH_MAX];
@@ -855,6 +898,8 @@ rte_mp_request(struct rte_mp_msg *req, struct rte_mp_reply 
*reply,
if (mp_request_one(path, req, reply, &end))
ret = -1;
}
+   /* unlock the directory */
+   flock(dir_fd, LOCK_UN);
 
closedir(mp_dir);
return ret;
-- 
2.7.4


[dpdk-dev] [PATCH v2 1/5] eal: add internal flag indicating init has completed

2018-02-27 Thread Anatoly Burakov
Currently, primary process initialization is finalized by setting
the RTE_MAGIC value in the shared config. However, it is not
possible to check whether secondary process initialization has
completed. Add such a value to internal config.

Signed-off-by: Anatoly Burakov 
---

Notes:
This patch is dependent upon earlier IPC fixes patchset [1].

[1] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Fixes/

v2 changes: none

 lib/librte_eal/common/eal_common_options.c | 1 +
 lib/librte_eal/common/eal_internal_cfg.h   | 2 ++
 lib/librte_eal/linuxapp/eal/eal.c  | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 9f2f8d2..0be80cb 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -194,6 +194,7 @@ eal_reset_internal_config(struct internal_config 
*internal_cfg)
internal_cfg->vmware_tsc_map = 0;
internal_cfg->create_uio_dev = 0;
internal_cfg->user_mbuf_pool_ops_name = NULL;
+   internal_cfg->init_complete = 0;
 }
 
 static int
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index 1169fcc..4e2c2e6 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -56,6 +56,8 @@ struct internal_config {
/**< user defined mbuf pool ops name */
unsigned num_hugepage_sizes;  /**< how many sizes on this system */
struct hugepage_info hugepage_info[MAX_HUGEPAGE_SIZES];
+   unsigned int init_complete;
+   /**< indicates whether EAL has completed initialization */
 };
 extern struct internal_config internal_config; /**< Global EAL configuration. 
*/
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 38306bf..2ecd07b 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -669,6 +669,8 @@ rte_eal_mcfg_complete(void)
/* ALL shared mem_config related INIT DONE */
if (rte_config.process_type == RTE_PROC_PRIMARY)
rte_config.mem_config->magic = RTE_MAGIC;
+
+   internal_config.init_complete = 1;
 }
 
 /*
-- 
2.7.4


[dpdk-dev] [PATCH v2 5/5] eal: don't hardcode socket filter value in IPC

2018-02-27 Thread Anatoly Burakov
Currently, filter value is hardcoded and disconnected from actual
value returned by eal_mp_socket_path(). Fix this to generate filter
value by deriving it from eal_mp_socket_path() instead.

Signed-off-by: Anatoly Burakov 
---

Notes:
v2: added this patch

 lib/librte_eal/common/eal_common_proc.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index 82ea4a7..bdea6d6 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -504,16 +504,17 @@ int
 rte_mp_channel_init(void)
 {
char thread_name[RTE_MAX_THREAD_NAME_LEN];
-   char *path;
+   char path[PATH_MAX];
int dir_fd;
pthread_t tid;
 
-   snprintf(mp_filter, PATH_MAX, ".%s_unix_*",
-internal_config.hugefile_prefix);
+   /* create filter path */
+   create_socket_path("*", path, sizeof(path));
+   snprintf(mp_filter, sizeof(mp_filter), "%s", basename(path));
 
-   path = strdup(eal_mp_socket_path());
-   snprintf(mp_dir_path, PATH_MAX, "%s", dirname(path));
-   free(path);
+   /* path may have been modified, so recreate it */
+   create_socket_path("*", path, sizeof(path));
+   snprintf(mp_dir_path, sizeof(mp_dir_path), "%s", dirname(path));
 
/* lock the directory */
dir_fd = open(mp_dir_path, O_RDONLY);
-- 
2.7.4


[dpdk-dev] [PATCH v2 3/5] eal: use locks to determine if secondary process is active

2018-02-27 Thread Anatoly Burakov
Previously, IPC would remove sockets it considers to be "inactive"
based on whether they have responded. Change this to create lock
files in addition to socket files, so that we can determine if
secondary process is active before attempting to communicate with
it. That way, we can distinguish secondaries that are alive but
are not responding, from those that have already died.

Signed-off-by: Anatoly Burakov 
---

Notes:
v2: no changes

 lib/librte_eal/common/eal_common_proc.c | 204 +++-
 1 file changed, 175 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index b4d00c3..17fded7 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -32,6 +33,7 @@
 #include "eal_internal_cfg.h"
 
 static int mp_fd = -1;
+static int lock_fd = -1;
 static char mp_filter[PATH_MAX];   /* Filter for secondary process sockets */
 static char mp_dir_path[PATH_MAX]; /* The directory path for all mp sockets */
 static pthread_mutex_t mp_mutex_action = PTHREAD_MUTEX_INITIALIZER;
@@ -104,6 +106,46 @@ find_sync_request(const char *dst, const char *act_name)
return r;
 }
 
+static void
+create_socket_path(const char *name, char *buf, int len)
+{
+   const char *prefix = eal_mp_socket_path();
+   if (strlen(name) > 0)
+   snprintf(buf, len, "%s_%s", prefix, name);
+   else
+   snprintf(buf, len, "%s", prefix);
+}
+
+static void
+create_lockfile_path(const char *name, char *buf, int len)
+{
+   const char *prefix = eal_mp_socket_path();
+   if (strlen(name) > 1)
+   snprintf(buf, len, "%slock_%s", prefix, name);
+   else
+   snprintf(buf, len, "%slock", prefix);
+}
+
+static const char *
+get_peer_name(const char *socket_full_path)
+{
+   char buf[PATH_MAX] = {0};
+   int len;
+
+   /* primary process has no peer name */
+   if (strcmp(socket_full_path, eal_mp_socket_path()) == 0)
+   return NULL;
+
+   /* construct dummy socket file name - make it one character long so that
+* we hit the code path where underscores are added
+*/
+   create_socket_path("a", buf, sizeof(buf));
+
+   /* we want to get everything after /path/.rte_unix_, so discard 'a' */
+   len = strlen(buf) - 1;
+   return &socket_full_path[len];
+}
+
 int
 rte_eal_primary_proc_alive(const char *config_file_path)
 {
@@ -330,8 +372,29 @@ mp_handle(void *arg __rte_unused)
 static int
 open_socket_fd(void)
 {
+   char peer_name[PATH_MAX] = {0};
+   char lockfile[PATH_MAX] = {0};
struct sockaddr_un un;
-   const char *prefix = eal_mp_socket_path();
+
+   if (rte_eal_process_type() == RTE_PROC_SECONDARY)
+   snprintf(peer_name, sizeof(peer_name), "%d_%"PRIx64,
+getpid(), rte_rdtsc());
+
+   /* try to create lockfile */
+   create_lockfile_path(peer_name, lockfile, sizeof(lockfile));
+
+   lock_fd = open(lockfile, O_CREAT | O_RDWR);
+   if (lock_fd < 0) {
+   RTE_LOG(ERR, EAL, "failed to open '%s': %s\n", lockfile,
+   strerror(errno));
+   return -1;
+   }
+   if (flock(lock_fd, LOCK_EX | LOCK_NB)) {
+   RTE_LOG(ERR, EAL, "failed to lock '%s': %s\n", lockfile,
+   strerror(errno));
+   return -1;
+   }
+   /* no need to downgrade to shared lock */
 
mp_fd = socket(AF_UNIX, SOCK_DGRAM, 0);
if (mp_fd < 0) {
@@ -341,13 +404,11 @@ open_socket_fd(void)
 
memset(&un, 0, sizeof(un));
un.sun_family = AF_UNIX;
-   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
-   snprintf(un.sun_path, sizeof(un.sun_path), "%s", prefix);
-   else {
-   snprintf(un.sun_path, sizeof(un.sun_path), "%s_%d_%"PRIx64,
-prefix, getpid(), rte_rdtsc());
-   }
+
+   create_socket_path(peer_name, un.sun_path, sizeof(un.sun_path));
+
unlink(un.sun_path); /* May still exist since last run */
+
if (bind(mp_fd, (struct sockaddr *)&un, sizeof(un)) < 0) {
RTE_LOG(ERR, EAL, "failed to bind %s: %s\n",
un.sun_path, strerror(errno));
@@ -359,6 +420,44 @@ open_socket_fd(void)
return mp_fd;
 }
 
+/* find corresponding lock file and try to lock it */
+static int
+socket_is_active(const char *peer_name)
+{
+   char lockfile[PATH_MAX] = {0};
+   int fd, ret = -1;
+
+   /* construct lockfile filename */
+   create_lockfile_path(peer_name, lockfile, sizeof(lockfile));
+
+   /* try to lock it */
+   fd = open(lockfile, O_CREAT | O_RDWR);
+   if (fd < 0) {
+   RTE_LOG(ERR, EAL, "Cannot open '%s': %s\n", lockfile,
+   strerror(errno));
+  

Re: [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function

2018-02-27 Thread Thomas Monjalon
27/02/2018 12:01, Ferruh Yigit:
> On 2/27/2018 10:29 AM, Kirill Rybalchenko wrote:
> > In 18.02 release the ABI of ethdev component was changed.
> > To keep compatibility with previous versions of the library
> > the versioning of rte_eth_dev_filter_ctrl function was implemented.
> > As soon as deprecation note was issued in 18.02 release, there is
> > no need to keep compatibility with previous versions.
> > Remove the versioning of rte_eth_dev_filter_ctrl function.
> > 
> > Signed-off-by: Kirill Rybalchenko 
> > ---
> >  lib/librte_ether/rte_ethdev.c | 155 
> > +-
> 
> Hi Kirill,
> 
> You need to update .map file and removed deprecation notice in this patch.

And bump the ABI version in Makefile and release notes.




Re: [dpdk-dev] [PATCH 0/5] remove void pointer explicit cast

2018-02-27 Thread Ferruh Yigit
On 2/26/2018 8:10 AM, Zhiyong Yang wrote:
> The patch series cleanup void pointer explicit cast related to
> struct rte_flow_item fields in librte_flow_classify and make
> code more readable.
> 
> Zhiyong Yang (5):
>   flow_classify: remove void pointer cast
>   net/ixgbe: remove void pointer cast
>   net/e1000: remove void pointer cast
>   net/bnxt: remove void pointer cast
>   net/sfc: remove void pointer cast

For series
Reviewed-by: Ferruh Yigit 


Re: [dpdk-dev] [RFC 2/3] vhost: add SET_VIRTIO_STATUS support

2018-02-27 Thread Maxime Coquelin



On 02/27/2018 02:10 PM, Jens Freimann wrote:

On Thu, Feb 22, 2018 at 07:19:09PM +0100, Maxime Coquelin wrote:

This patch implements support for the new SET_VIRTIO_STATUS
vhost-user request.

The main use for this new request is for the backend to know
when the driver sets the DRIVER_OK status bit. Starting Virtio
1.0, we know that once the the bit is set, no more queues will
be initialized.
When it happens, this patch removes all queues starting from
the first uninitialized one, so that the port starts even if
the guest driver does not use all the queues provided by QEMU.
This is for example the case with Windows driver, which only
initializes as much queue pairs as vCPUs.

The second use for this request is when the status changes to
reset or failed state, the vhost port is stopped and virtqueues
cleaned and freed.

Signed-off-by: Maxime Coquelin 
---
lib/librte_vhost/vhost_user.c | 98 
+++

lib/librte_vhost/vhost_user.h |  5 ++-
2 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c 
b/lib/librte_vhost/vhost_user.c

index c256ebb06..7ab02c44b 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -67,6 +67,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] 
= {

[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
[VHOST_USER_SET_SLAVE_REQ_FD]  = "VHOST_USER_SET_SLAVE_REQ_FD",
[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
+    [VHOST_USER_SET_VIRTIO_STATUS]  = "VHOST_USER_SET_VIRTIO_STATUS",
};

static uint64_t
@@ -1244,6 +1245,100 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, 
struct VhostUserMsg *msg)

return 0;
}

+static int
+vhost_user_set_virtio_status(struct virtio_net *dev, struct 
VhostUserMsg *msg)

+{
+    uint8_t old_status, new_status;
+    uint32_t i;
+
+    /* As per Virtio spec, the Virtio device status is 8 bits wide */
+    if (msg->payload.u64 != (uint8_t)msg->payload.u64) {
+    RTE_LOG(ERR, VHOST_CONFIG,
+    "Invalid Virtio dev status value (%lx)\n",
+    msg->payload.u64);
+    return -1;
+    }
+
+    new_status = msg->payload.u64;
+    old_status = dev->virtio_status;
+
+    if (new_status == old_status)
+    return 0;
+
+    RTE_LOG(DEBUG, VHOST_CONFIG,
+    "New Virtio device status %x (was %x)\n",
+    new_status, old_status);
+
+    dev->virtio_status = new_status;
+
+    if (new_status == 0 || new_status & VIRTIO_CONFIG_S_FAILED) {
+    /*
+ * The device moved to reset  or failed state,
+ * stop processing the virtqueues
+ */
+    if (dev->flags & VIRTIO_DEV_RUNNING) {
+    dev->flags &= ~VIRTIO_DEV_RUNNING;
+    dev->notify_ops->destroy_device(dev->vid);
+    }
+
+    while (dev->nr_vring > 0) {
+    struct vhost_virtqueue *vq;
+
+    vq = dev->virtqueue[--dev->nr_vring];
+    if (!vq)
+    continue;
+
+    dev->virtqueue[dev->nr_vring] = NULL;
+    cleanup_vq(dev, vq, 1);
+    free_vq(vq);
+    }
+
+    return 0;
+    }
+
+    if ((dev->features & (1ULL << VIRTIO_F_VERSION_1)) &&
+    (new_status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+    !virtio_is_ready(dev)) {
+    /*
+ * Since Virtio 1.0, we know that no more queues will be
+ * setup after guest sets DRIVER_OK. So let's remove
+ * uinitialized queues.
+ */
+    RTE_LOG(INFO, VHOST_CONFIG,
+    "Driver is ready, but some queues aren't 
initialized\n");

+
+    /*
+ * Find the first uninitialized queue.
+ *
+ * Note: Ideally the backend implementation should
+ * support sparsed virtqueues, but as long as it is
+ * not the case, let's remove all queues after the
+ * first uninitialized one.
+ */
+    for (i = 0; i < dev->nr_vring; i++) {
+    if (!vq_is_ready(dev->virtqueue[i]))
+    break;
+    }
+
+    while (dev->nr_vring >= i) {
+    struct vhost_virtqueue *vq;
+
+    vq = dev->virtqueue[--dev->nr_vring];


If i is 0, we could access an array element out of bounds, no?



Thanks for spotting this off-by-one error, it should be:
+while (dev->nr_vring > i) {


With this fixed,

Reviewed-by: Jens Freimann 
regards,
Jens


Thanks,
Maxime


[dpdk-dev] [PATCH v2] ethdev: remove versioning of ethdev filter control function

2018-02-27 Thread Kirill Rybalchenko
In 18.02 release the ABI of ethdev component was changed.
To keep compatibility with previous versions of the library
the versioning of rte_eth_dev_filter_ctrl function was implemented.
As soon as deprecation note was issued in 18.02 release, there is
no need to keep compatibility with previous versions.
Remove the versioning of rte_eth_dev_filter_ctrl function.

v2:
Modify map file, increment library version,
remove deprecation notice

Signed-off-by: Kirill Rybalchenko 
---
 doc/guides/rel_notes/deprecation.rst|   6 --
 doc/guides/rel_notes/release_18_05.rst  |   2 +-
 lib/librte_ether/Makefile   |   2 +-
 lib/librte_ether/rte_ethdev.c   | 155 +---
 lib/librte_ether/rte_ethdev_version.map |   1 -
 5 files changed, 4 insertions(+), 162 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 74c18ed..6594585 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -149,12 +149,6 @@ Deprecation Notices
   as parameter. For consistency functions adding callback will return
   ``struct rte_eth_rxtx_callback \*`` instead of ``void \*``.
 
-* ethdev: The size of variables ``flow_types_mask`` in
-  ``rte_eth_fdir_info structure``, ``sym_hash_enable_mask`` and
-  ``valid_bit_mask`` in ``rte_eth_hash_global_conf`` structure
-  will be increased from 32 to 64 bits to fulfill hardware requirements.
-  This change will break existing ABI as size of the structures will increase.
-
 * ethdev: ``rte_eth_dev_get_sec_ctx()`` fix port id storage
   ``rte_eth_dev_get_sec_ctx()`` is using ``uint8_t`` for ``port_id``,
   which should be ``uint16_t``.
diff --git a/doc/guides/rel_notes/release_18_05.rst 
b/doc/guides/rel_notes/release_18_05.rst
index 3923dc2..22da411 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -128,7 +128,7 @@ The libraries prepended with a plus sign were incremented 
in this version.
  librte_cryptodev.so.4
  librte_distributor.so.1
  librte_eal.so.6
- librte_ethdev.so.8
+   + librte_ethdev.so.9
  librte_eventdev.so.3
  librte_flow_classify.so.1
  librte_gro.so.1
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 3ca5782..c2f2f7d 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -16,7 +16,7 @@ LDLIBS += -lrte_mbuf
 
 EXPORT_MAP := rte_ethdev_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 0590f0c..78b8376 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -34,7 +34,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "rte_ether.h"
 #include "rte_ethdev.h"
@@ -3490,153 +3489,8 @@ rte_eth_dev_filter_supported(uint16_t port_id,
 }
 
 int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-   enum rte_filter_type filter_type,
-   enum rte_filter_op filter_op, void *arg);
-
-int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-   enum rte_filter_type filter_type,
-   enum rte_filter_op filter_op, void *arg)
-{
-   struct rte_eth_fdir_info_v22 {
-   enum rte_fdir_mode mode;
-   struct rte_eth_fdir_masks mask;
-   struct rte_eth_fdir_flex_conf flex_conf;
-   uint32_t guarant_spc;
-   uint32_t best_spc;
-   uint32_t flow_types_mask[1];
-   uint32_t max_flexpayload;
-   uint32_t flex_payload_unit;
-   uint32_t max_flex_payload_segment_num;
-   uint16_t flex_payload_limit;
-   uint32_t flex_bitmask_unit;
-   uint32_t max_flex_bitmask_num;
-   };
-
-   struct rte_eth_hash_global_conf_v22 {
-   enum rte_eth_hash_function hash_func;
-   uint32_t sym_hash_enable_mask[1];
-   uint32_t valid_bit_mask[1];
-   };
-
-   struct rte_eth_hash_filter_info_v22 {
-   enum rte_eth_hash_filter_info_type info_type;
-   union {
-   uint8_t enable;
-   struct rte_eth_hash_global_conf_v22 global_conf;
-   struct rte_eth_input_set_conf input_set_conf;
-   } info;
-   };
-
-   struct rte_eth_dev *dev;
-
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-   dev = &rte_eth_devices[port_id];
-   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->filter_ctrl, -ENOTSUP);
-   if (filter_op == RTE_ETH_FILTER_INFO) {
-   int retval;
-   struct rte_eth_fdir_info_v22 *fdir_info_v22;
-   struct rte_eth_fdir_info fdir_info;
-
-   fdir_info_v22 = (struct rte_eth_fdir_info_v22 *)arg;
-
-   retval = (*dev->dev_ops->filter_ctrl)(dev, filter_type,
-  

[dpdk-dev] [PATCH] net/vdev_netvsc: fix routed devices probing

2018-02-27 Thread Matan Azrad
NetVSC netdevices which are already routed should not be probed because
they are used for management purposes by the HyperV.

The corrupted code got the routed devices from the system file
/proc/net/route and wrongly parsed only the odd lines, so devices which
their routes were in even lines, were considered as unrouted devices
and were probed.

Use linux netlink lib to detect the routed NetVSC devices instead of
file parsing.

Fixes: 31182fadfb21 ("net/vdev_netvsc: skip routed netvsc probing")
Cc: sta...@dpdk.org
Cc: step...@networkplumber.org

Suggested-by: Stephen Hemminger 
Signed-off-by: Matan Azrad 
---
 drivers/net/vdev_netvsc/vdev_netvsc.c | 109 +++---
 1 file changed, 86 insertions(+), 23 deletions(-)

diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c 
b/drivers/net/vdev_netvsc/vdev_netvsc.c
index cbf4d59..db0080a 100644
--- a/drivers/net/vdev_netvsc/vdev_netvsc.c
+++ b/drivers/net/vdev_netvsc/vdev_netvsc.c
@@ -7,6 +7,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -207,36 +209,96 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list =
  *
  * @param[in] name
  *   Network device name.
+ * @param[in] family
+ *   Address family: AF_INET for IPv4 or AF_INET6 for IPv6.
  *
  * @return
- *   A nonzero value when interface has an route. In case of error,
- *   rte_errno is updated and 0 returned.
+ *   1 when interface has a route, negative errno value in case of error and
+ *   0 otherwise.
  */
 static int
-vdev_netvsc_has_route(const char *name)
+vdev_netvsc_has_route(const struct if_nameindex *iface,
+ const unsigned char family)
 {
-   FILE *fp;
+   /*
+* The implementation can be simpler by getifaddrs() function usage but
+* it works for IPv6 only starting from glibc 2.3.3.
+*/
+   char buf[4096];
+   int len;
int ret = 0;
-   char route[NETVSC_MAX_ROUTE_LINE_SIZE];
-   char *netdev;
-
-   fp = fopen("/proc/net/route", "r");
-   if (!fp) {
-   rte_errno = errno;
-   return 0;
+   int res;
+   int sock;
+   struct nlmsghdr *retmsg = (struct nlmsghdr *)buf;
+   struct sockaddr_nl sa;
+   struct {
+   struct nlmsghdr nlhdr;
+   struct ifaddrmsg addrmsg;
+   } msg;
+
+   if (!iface || (family != AF_INET && family != AF_INET6)) {
+   DRV_LOG(ERR, "%s", rte_strerror(EINVAL));
+   return -EINVAL;
}
-   while (fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) {
-   netdev = strtok(route, "\t");
-   if (strcmp(netdev, name) == 0) {
-   ret = 1;
-   break;
+   sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
+   if (sock == -1) {
+   DRV_LOG(ERR, "cannot open socket: %s", rte_strerror(errno));
+   return -errno;
+   }
+   memset(&sa, 0, sizeof(sa));
+   sa.nl_family = AF_NETLINK;
+   sa.nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR;
+   res = bind(sock, (struct sockaddr *)&sa, sizeof(sa));
+   if (res == -1) {
+   ret = -errno;
+   DRV_LOG(ERR, "cannot bind socket: %s", rte_strerror(errno));
+   goto close;
+   }
+   memset(&msg, 0, sizeof(msg));
+   msg.nlhdr.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifaddrmsg));
+   msg.nlhdr.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP;
+   msg.nlhdr.nlmsg_type = RTM_GETADDR;
+   msg.nlhdr.nlmsg_pid = getpid();
+   msg.addrmsg.ifa_family = family;
+   msg.addrmsg.ifa_index = iface->if_index;
+   res = send(sock, &msg, msg.nlhdr.nlmsg_len, 0);
+   if (res == -1) {
+   ret = -errno;
+   DRV_LOG(ERR, "cannot send socket message: %s",
+   rte_strerror(errno));
+   goto close;
+   }
+   memset(buf, 0, sizeof(buf));
+   len = recv(sock, buf, sizeof(buf), 0);
+   if (len == -1) {
+   ret = -errno;
+   DRV_LOG(ERR, "cannot receive socket message: %s",
+   rte_strerror(errno));
+   goto close;
+   }
+   while (NLMSG_OK(retmsg, (unsigned int)len)) {
+   struct ifaddrmsg *retaddr =
+   (struct ifaddrmsg *)NLMSG_DATA(retmsg);
+
+   if (retaddr->ifa_family == family &&
+   retaddr->ifa_index == iface->if_index) {
+   struct rtattr *retrta = IFA_RTA(retaddr);
+   int attlen = IFA_PAYLOAD(retmsg);
+
+   while (RTA_OK(retrta, attlen)) {
+   if (retrta->rta_type == IFA_ADDRESS) {
+   ret = 1;
+   DRV_LOG(DEBUG, "interface %s has IP",
+   iface->if_name);
+   goto close;
+  

[dpdk-dev] [PATCH] event/sw: add unlikely branch predict

2018-02-27 Thread Vipin Varghese
For most run cases 'sw->started' holds true. Adding a branch prediction
suggestion to compiler helps as this is first conditional check just
after entering the function.

Signed-off-by: Vipin Varghese 
---
 drivers/event/sw/sw_evdev_scheduler.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/event/sw/sw_evdev_scheduler.c 
b/drivers/event/sw/sw_evdev_scheduler.c
index 3106eb3..17bd4c0 100644
--- a/drivers/event/sw/sw_evdev_scheduler.c
+++ b/drivers/event/sw/sw_evdev_scheduler.c
@@ -508,7 +508,7 @@ sw_event_schedule(struct rte_eventdev *dev)
uint32_t i;
 
sw->sched_called++;
-   if (!sw->started)
+   if (unlikely(!sw->started))
return;
 
do {
-- 
2.7.4



[dpdk-dev] [dpdk-announce] DPDK 17.11.1 (LTS) released

2018-02-27 Thread Yuanhan Liu
Hi all,

Here is a new LTS release:
http://fast.dpdk.org/rel/dpdk-17.11.1.tar.xz

The git tree is at:
http://dpdk.org/browse/dpdk-stable/

Thanks.

--yliu

---
 app/Makefile   |   2 +-
 app/test-pmd/Makefile  |   4 -
 app/test-pmd/cmdline.c |  14 +-
 app/test-pmd/config.c  | 109 ++--
 app/test-pmd/flowgen.c |  10 +-
 app/test-pmd/parameters.c  |  13 +-
 app/test-pmd/testpmd.c |  92 +++
 app/test-pmd/testpmd.h |   5 +
 app/test-pmd/txonly.c  |   1 +
 buildtools/pmdinfogen/pmdinfogen.c |   5 +-
 config/common_base |   7 +-
 config/common_linuxapp |   1 +
 doc/guides/cryptodevs/openssl.rst  |  15 +-
 doc/guides/cryptodevs/qat.rst  |   1 +
 doc/guides/nics/i40e.rst   |  23 +
 doc/guides/nics/mlx4.rst   |   8 -
 doc/guides/rel_notes/release_17_11.rst | 255 +
 doc/guides/sample_app_ug/ipsec_secgw.rst   |  10 +-
 doc/guides/sample_app_ug/keep_alive.rst|   2 +-
 drivers/bus/dpaa/base/qbman/bman.h |  32 +-
 drivers/bus/dpaa/base/qbman/qman.c |   5 +
 drivers/bus/dpaa/base/qbman/qman.h |  64 ++-
 drivers/bus/dpaa/dpaa_bus.c|   4 +
 drivers/bus/dpaa/include/fsl_qman.h|   4 +-
 drivers/bus/dpaa/rte_dpaa_bus.h|   6 +-
 drivers/bus/fslmc/fslmc_vfio.c |   2 +-
 drivers/bus/fslmc/mc/fsl_mc_sys.h  |   1 -
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h|   8 +-
 drivers/bus/fslmc/rte_fslmc.h  |   8 +-
 drivers/bus/pci/linux/pci.c|  91 ++-
 drivers/bus/pci/linux/pci_vfio.c   |   2 -
 drivers/bus/pci/pci_common_uio.c   |   1 -
 drivers/bus/vdev/vdev.c|   5 +-
 drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c|   2 +-
 drivers/crypto/dpaa2_sec/dpaa2_sec_priv.h  |   1 +
 drivers/crypto/dpaa_sec/dpaa_sec.c |   8 +-
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c   |  10 +
 drivers/crypto/qat/qat_crypto.c|  28 +-
 drivers/crypto/qat/qat_qp.c|  10 +-
 drivers/crypto/scheduler/rte_cryptodev_scheduler.c |   8 +-
 drivers/event/octeontx/Makefile|   2 +-
 drivers/event/octeontx/ssovf_worker.h  |   6 +-
 drivers/event/sw/sw_evdev.c|  46 +-
 drivers/event/sw/sw_evdev.h|   2 +-
 drivers/mempool/octeontx/octeontx_fpavf.c  |  23 +-
 drivers/mempool/octeontx/octeontx_fpavf.h  |   6 +-
 drivers/mempool/octeontx/octeontx_mbox.c   |   4 +-
 drivers/mempool/octeontx/rte_mempool_octeontx.c|  72 +--
 drivers/net/af_packet/rte_eth_af_packet.c  |   2 +-
 drivers/net/avp/rte_avp_common.h   |   1 +
 drivers/net/bnxt/bnxt.h|   1 +
 drivers/net/bnxt/bnxt_cpr.c|   1 -
 drivers/net/bnxt/bnxt_ethdev.c |  55 +-
 drivers/net/bnxt/bnxt_filter.c |  42 +-
 drivers/net/bnxt/bnxt_hwrm.c   |  51 +-
 drivers/net/bnxt/bnxt_ring.c   |   7 +-
 drivers/net/bnxt/bnxt_ring.h   |   2 +-
 drivers/net/bnxt/bnxt_rxq.c|   2 +-
 drivers/net/bnxt/bnxt_rxr.c|   4 +-
 drivers/net/bnxt/bnxt_txr.c|  17 +-
 drivers/net/bonding/rte_eth_bond_8023ad.c  |   3 +-
 drivers/net/bonding/rte_eth_bond_api.c |  40 +-
 drivers/net/bonding/rte_eth_bond_pmd.c |  10 +-
 drivers/net/bonding/rte_eth_bond_private.h |   3 +
 drivers/net/dpaa/dpaa_ethdev.c |  56 +-
 drivers/net/dpaa/dpaa_ethdev.h |   2 +-
 drivers/net/dpaa/dpaa_rxtx.c   |  21 +-
 drivers/net/e1000/em_ethdev.c  |   2 +-
 drivers/net/e1000/igb_ethdev.c |   7 +-
 drivers/net/e1000/igb_flow.c   |  20 +
 drivers/net/ena/ena_ethdev.c   |  10 +-
 drivers/net/enic/enic.h|  25 +-
 drivers/net/enic/enic_ethdev.c |  20 +-
 drivers/net/enic/enic_main.c   |  43 +-
 drivers/net/enic/enic_rxtx.c   |   3 +-
 drivers/net/failsafe/failsafe.c|   2 +-
 drivers/net/failsafe/failsafe_args.c   |   2 +-
 drivers/net/failsafe/failsafe_rxtx.c   |   2 +-
 drivers/net/fm10k/fm10k_ethdev.c   |   4 +-
 d

[dpdk-dev] [RFC v2 0/3] host: multiqueue improvements

2018-02-27 Thread Maxime Coquelin
This second revision takes Jens comments into account, main
change is fixing an off-by-one error in patch 2.

The series introduce support for a new protocol request that
notifies the backend with Virtio device status updates.

Main goal is to be able with Virtio 1.0 devices to start
the port even if the guest hasn't initialized all the
queue pairs of the device. This case happens for example
with Windows driver if more queue pairs are declared than
there are vCPUs.

The patch also handles reset and failed driver status to
stop the device and destroy the virtqueues.

Last patch implements a workaround for old and current
QEMUs, that sends SET_VRING_ADDR requests for uninitalized
queues, which can leads to guest memory corruption if
the host application requests to diasble queues
notifications.

I posted the series as RFC, as the QEMU & specification
parts for the new request haven't been accepted yet.


Changes since RFC v1:
=
- move virtio_status declaration in the right patch
- Fix off-by-one error when removing uninitialized queues

Maxime Coquelin (3):
  vhost: invalidate vring addresses in cleanup_vq()
  vhost: add SET_VIRTIO_STATUS support
  vhost_user: work around invalid rings addresses sent by QEMU

 lib/librte_vhost/vhost.c  |   6 ++-
 lib/librte_vhost/vhost.h  |   4 +-
 lib/librte_vhost/vhost_user.c | 113 +-
 lib/librte_vhost/vhost_user.h |   5 +-
 4 files changed, 123 insertions(+), 5 deletions(-)

-- 
2.14.3



[dpdk-dev] [RFC v2 1/3] vhost: invalidate vring addresses in cleanup_vq()

2018-02-27 Thread Maxime Coquelin
When cleaning-up the virtqueue, we also need to invalidate its
addresses to be sure outdated addresses won't be used later.

Signed-off-by: Maxime Coquelin 
Reviewed-by: Jens Freimann 
---
 lib/librte_vhost/vhost.c  | 6 --
 lib/librte_vhost/vhost.h  | 3 ++-
 lib/librte_vhost/vhost_user.c | 2 +-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index f6f12a03b..e4281cf67 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -69,12 +69,14 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
 }
 
 void
-cleanup_vq(struct vhost_virtqueue *vq, int destroy)
+cleanup_vq(struct virtio_net *dev, struct vhost_virtqueue *vq, int destroy)
 {
if ((vq->callfd >= 0) && (destroy != 0))
close(vq->callfd);
if (vq->kickfd >= 0)
close(vq->kickfd);
+
+   vring_invalidate(dev, vq);
 }
 
 /*
@@ -89,7 +91,7 @@ cleanup_device(struct virtio_net *dev, int destroy)
vhost_backend_cleanup(dev);
 
for (i = 0; i < dev->nr_vring; i++)
-   cleanup_vq(dev->virtqueue[i], destroy);
+   cleanup_vq(dev, dev->virtqueue[i], destroy);
 }
 
 void
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 58aec2e0d..481700489 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -362,7 +362,8 @@ void cleanup_device(struct virtio_net *dev, int destroy);
 void reset_device(struct virtio_net *dev);
 void vhost_destroy_device(int);
 
-void cleanup_vq(struct vhost_virtqueue *vq, int destroy);
+void cleanup_vq(struct virtio_net *dev, struct vhost_virtqueue *vq,
+   int destroy);
 void free_vq(struct vhost_virtqueue *vq);
 
 int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx);
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 5c5361066..c256ebb06 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -219,7 +219,7 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t 
features)
continue;
 
dev->virtqueue[dev->nr_vring] = NULL;
-   cleanup_vq(vq, 1);
+   cleanup_vq(dev, vq, 1);
free_vq(vq);
}
}
-- 
2.14.3



[dpdk-dev] [RFC v2 2/3] vhost: add SET_VIRTIO_STATUS support

2018-02-27 Thread Maxime Coquelin
This patch implements support for the new SET_VIRTIO_STATUS
vhost-user request.

The main use for this new request is for the backend to know
when the driver sets the DRIVER_OK status bit. Starting Virtio
1.0, we know that once the the bit is set, no more queues will
be initialized.
When it happens, this patch removes all queues starting from
the first uninitialized one, so that the port starts even if
the guest driver does not use all the queues provided by QEMU.
This is for example the case with Windows driver, which only
initializes as much queue pairs as vCPUs.

The second use for this request is when the status changes to
reset or failed state, the vhost port is stopped and virtqueues
cleaned and freed.

Signed-off-by: Maxime Coquelin 
Reviewed-by: Jens Freimann 
---
 lib/librte_vhost/vhost.h  |  1 +
 lib/librte_vhost/vhost_user.c | 98 +++
 lib/librte_vhost/vhost_user.h |  5 ++-
 3 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 481700489..4ebf84bec 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -241,6 +241,7 @@ struct virtio_net {
struct guest_page   *guest_pages;
 
int slave_req_fd;
+   uint8_t virtio_status;
 } __rte_cache_aligned;
 
 
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index c256ebb06..63f501e8d 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -67,6 +67,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
[VHOST_USER_SET_SLAVE_REQ_FD]  = "VHOST_USER_SET_SLAVE_REQ_FD",
[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
+   [VHOST_USER_SET_VIRTIO_STATUS]  = "VHOST_USER_SET_VIRTIO_STATUS",
 };
 
 static uint64_t
@@ -1244,6 +1245,100 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct 
VhostUserMsg *msg)
return 0;
 }
 
+static int
+vhost_user_set_virtio_status(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+   uint8_t old_status, new_status;
+   uint32_t i;
+
+   /* As per Virtio spec, the Virtio device status is 8 bits wide */
+   if (msg->payload.u64 != (uint8_t)msg->payload.u64) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "Invalid Virtio dev status value (%lx)\n",
+   msg->payload.u64);
+   return -1;
+   }
+
+   new_status = msg->payload.u64;
+   old_status = dev->virtio_status;
+
+   if (new_status == old_status)
+   return 0;
+
+   RTE_LOG(DEBUG, VHOST_CONFIG,
+   "New Virtio device status %x (was %x)\n",
+   new_status, old_status);
+
+   dev->virtio_status = new_status;
+
+   if (new_status == 0 || new_status & VIRTIO_CONFIG_S_FAILED) {
+   /*
+* The device moved to reset  or failed state,
+* stop processing the virtqueues
+*/
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
+   dev->notify_ops->destroy_device(dev->vid);
+   }
+
+   while (dev->nr_vring > 0) {
+   struct vhost_virtqueue *vq;
+
+   vq = dev->virtqueue[--dev->nr_vring];
+   if (!vq)
+   continue;
+
+   dev->virtqueue[dev->nr_vring] = NULL;
+   cleanup_vq(dev, vq, 1);
+   free_vq(vq);
+   }
+
+   return 0;
+   }
+
+   if ((dev->features & (1ULL << VIRTIO_F_VERSION_1)) &&
+   (new_status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+   !virtio_is_ready(dev)) {
+   /*
+* Since Virtio 1.0, we know that no more queues will be
+* setup after guest sets DRIVER_OK. So let's remove
+* uinitialized queues.
+*/
+   RTE_LOG(INFO, VHOST_CONFIG,
+   "Driver is ready, but some queues aren't 
initialized\n");
+
+   /*
+* Find the first uninitialized queue.
+*
+* Note: Ideally the backend implementation should
+* support sparsed virtqueues, but as long as it is
+* not the case, let's remove all queues after the
+* first uninitialized one.
+*/
+   for (i = 0; i < dev->nr_vring; i++) {
+   if (!vq_is_ready(dev->virtqueue[i]))
+   break;
+   }
+
+   while (dev->nr_vring > i) {
+   struct vhost_virtqueue *vq;
+
+   vq = dev->virtqueue[--dev->nr_vring];
+   if (!vq)
+

[dpdk-dev] [RFC v2 3/3] vhost_user: work around invalid rings addresses sent by QEMU

2018-02-27 Thread Maxime Coquelin
When the guest driver driver does not initialize all the queues,
QEMU currently sends SET_VRING_ADDR request for these queues.
In this case all the desc, avail and used addresses have GPA 0,
so translating them likely succeed.

The problem is that even if the uninitialized queues remain
disabled, the host application may request to disable the
notifications using rte_vhost_enable_guest_notification().
Doing this results in writing 0 to the used ring flag field,
so resulting in writing 0 in the guest physical address 0.

This patch adds a check to ensure all the ring addresses are
different before their translation.

When VHOST_USER_F_PROTOCOL_VIRTIO_STATUS and VIRTIO_F_VERSION_1
have been negotiated, the uninitialized queues will be removed
when driver sets the DRIVER_OK status bit.
Otherwise, the port will never start to avoid any guest memory
corruption.

Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 63f501e8d..afa7f7268 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -448,6 +448,19 @@ translate_ring_addresses(struct virtio_net *dev, int 
vq_index)
if (vq->desc && vq->avail && vq->used)
return dev;
 
+   /*
+* QEMU currently sends SET_VRING_ADDR request even for queues
+* not initialized by the guest driver. In this case, all rings
+* addresses are identical (GPA 0).
+*/
+   if (addr->desc_user_addr == addr->avail_user_addr &&
+   addr->desc_user_addr == addr->used_user_addr) {
+   RTE_LOG(INFO, VHOST_CONFIG,
+   "Invalid rings addresses for dev %d queue %d\n",
+   dev->vid, vq_index);
+   return dev;
+   }
+
vq->desc = (struct vring_desc *)(uintptr_t)ring_addr_to_vva(dev,
vq, addr->desc_user_addr, sizeof(struct vring_desc));
if (vq->desc == 0) {
-- 
2.14.3



[dpdk-dev] [PATCH] event/sw: move stats code for better cache access

2018-02-27 Thread Vipin Varghese
variables 'out_pkts_total' and 'out_pkts_total' will be in registers.
Hence shifting the code after the loop, helps the update from registers.

Signed-off-by: Vipin Varghese 
---
 drivers/event/sw/sw_evdev_scheduler.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/event/sw/sw_evdev_scheduler.c 
b/drivers/event/sw/sw_evdev_scheduler.c
index 17bd4c0..9143b93 100644
--- a/drivers/event/sw/sw_evdev_scheduler.c
+++ b/drivers/event/sw/sw_evdev_scheduler.c
@@ -541,6 +541,12 @@ sw_event_schedule(struct rte_eventdev *dev)
break;
} while ((int)out_pkts_total < sched_quanta);
 
+   sw->stats.tx_pkts += out_pkts_total;
+   sw->stats.rx_pkts += in_pkts_total;
+
+   sw->sched_no_iq_enqueues += (in_pkts_total == 0);
+   sw->sched_no_cq_enqueues += (out_pkts_total == 0);
+
/* push all the internal buffered QEs in port->cq_ring to the
 * worker cores: aka, do the ring transfers batched.
 */
@@ -552,10 +558,4 @@ sw_event_schedule(struct rte_eventdev *dev)
sw->ports[i].cq_buf_count = 0;
}
 
-   sw->stats.tx_pkts += out_pkts_total;
-   sw->stats.rx_pkts += in_pkts_total;
-
-   sw->sched_no_iq_enqueues += (in_pkts_total == 0);
-   sw->sched_no_cq_enqueues += (out_pkts_total == 0);
-
 }
-- 
2.7.4



[dpdk-dev] [PATCH v3 2/5] eal: don't process IPC messages before init finished

2018-02-27 Thread Anatoly Burakov
It is not possible for a primary process to receive any messages
while initializing, because RTE_MAGIC value is not set in the
shared config, and hence no secondary process can ever spin up
during that time.

However, it is possible for a secondary process to receive messages
from the primary during initialization. We can't just drop the
messages as they may be important, and also we might need to process
replies to our own requests (e.g. VFIO) during initialization.

Therefore, add a tailq for incoming messages, and queue them up
until initialization is complete, and process them in order they
arrived.

Signed-off-by: Anatoly Burakov 
---

Notes:
v3: check for init_complete after receiving message

v2: no changes

 lib/librte_eal/common/eal_common_proc.c | 52 +
 1 file changed, 47 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index 3a1088e..a6e24e6 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "eal_private.h"
 #include "eal_filesystem.h"
@@ -58,6 +59,18 @@ struct mp_msg_internal {
struct rte_mp_msg msg;
 };
 
+struct message_queue_entry {
+   TAILQ_ENTRY(message_queue_entry) next;
+   struct mp_msg_internal msg;
+   struct sockaddr_un sa;
+};
+
+/** Double linked list of received messages. */
+TAILQ_HEAD(message_queue, message_queue_entry);
+
+static struct message_queue message_queue =
+   TAILQ_HEAD_INITIALIZER(message_queue);
+
 struct sync_request {
TAILQ_ENTRY(sync_request) next;
int reply_received;
@@ -276,12 +289,41 @@ process_msg(struct mp_msg_internal *m, struct sockaddr_un 
*s)
 static void *
 mp_handle(void *arg __rte_unused)
 {
-   struct mp_msg_internal msg;
-   struct sockaddr_un sa;
-
+   struct message_queue_entry *cur_msg, *next_msg, *new_msg = NULL;
while (1) {
-   if (read_msg(&msg, &sa) == 0)
-   process_msg(&msg, &sa);
+   /* we want to process all messages in order of their arrival,
+* but status of init_complete may change while we're iterating
+* the tailq. so, store it here and check once every iteration.
+*/
+   int init_complete;
+
+   if (new_msg == NULL)
+   new_msg = malloc(sizeof(*new_msg));
+   if (read_msg(&new_msg->msg, &new_msg->sa) == 0) {
+   /* we successfully read the message, so enqueue it */
+   TAILQ_INSERT_TAIL(&message_queue, new_msg, next);
+   new_msg = NULL;
+   } /* reuse new_msg for next message if we couldn't read_msg */
+
+   init_complete = internal_config.init_complete;
+
+   /* tailq only accessed here, so no locking needed */
+   TAILQ_FOREACH_SAFE(cur_msg, &message_queue, next, next_msg) {
+   /* secondary process should not process any incoming
+* requests until its initialization is complete, but
+* it is allowed to process replies to its own queries.
+*/
+   if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
+   !init_complete &&
+   cur_msg->msg.type != MP_REP)
+   continue;
+
+   TAILQ_REMOVE(&message_queue, cur_msg, next);
+
+   process_msg(&cur_msg->msg, &cur_msg->sa);
+
+   free(cur_msg);
+   }
}
 
return NULL;
-- 
2.7.4


[dpdk-dev] [PATCH v3 1/5] eal: add internal flag indicating init has completed

2018-02-27 Thread Anatoly Burakov
Currently, primary process initialization is finalized by setting
the RTE_MAGIC value in the shared config. However, it is not
possible to check whether secondary process initialization has
completed. Add such a value to internal config.

Signed-off-by: Anatoly Burakov 
---

Notes:
This patch is dependent upon earlier IPC fixes patchset [1].

[1] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Fixes/

v3: no changes

v2: no changes

 lib/librte_eal/common/eal_common_options.c | 1 +
 lib/librte_eal/common/eal_internal_cfg.h   | 2 ++
 lib/librte_eal/linuxapp/eal/eal.c  | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 9f2f8d2..0be80cb 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -194,6 +194,7 @@ eal_reset_internal_config(struct internal_config 
*internal_cfg)
internal_cfg->vmware_tsc_map = 0;
internal_cfg->create_uio_dev = 0;
internal_cfg->user_mbuf_pool_ops_name = NULL;
+   internal_cfg->init_complete = 0;
 }
 
 static int
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index 1169fcc..4e2c2e6 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -56,6 +56,8 @@ struct internal_config {
/**< user defined mbuf pool ops name */
unsigned num_hugepage_sizes;  /**< how many sizes on this system */
struct hugepage_info hugepage_info[MAX_HUGEPAGE_SIZES];
+   unsigned int init_complete;
+   /**< indicates whether EAL has completed initialization */
 };
 extern struct internal_config internal_config; /**< Global EAL configuration. 
*/
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 38306bf..2ecd07b 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -669,6 +669,8 @@ rte_eal_mcfg_complete(void)
/* ALL shared mem_config related INIT DONE */
if (rte_config.process_type == RTE_PROC_PRIMARY)
rte_config.mem_config->magic = RTE_MAGIC;
+
+   internal_config.init_complete = 1;
 }
 
 /*
-- 
2.7.4


[dpdk-dev] [PATCH v3 5/5] eal: don't hardcode socket filter value in IPC

2018-02-27 Thread Anatoly Burakov
Currently, filter value is hardcoded and disconnected from actual
value returned by eal_mp_socket_path(). Fix this to generate filter
value by deriving it from eal_mp_socket_path() instead.

Signed-off-by: Anatoly Burakov 
---

Notes:
v3: no changes

v2: no changes

 lib/librte_eal/common/eal_common_proc.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index 7856a7b..4cf3aa6 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -506,16 +506,17 @@ int
 rte_mp_channel_init(void)
 {
char thread_name[RTE_MAX_THREAD_NAME_LEN];
-   char *path;
+   char path[PATH_MAX];
int dir_fd;
pthread_t tid;
 
-   snprintf(mp_filter, PATH_MAX, ".%s_unix_*",
-internal_config.hugefile_prefix);
+   /* create filter path */
+   create_socket_path("*", path, sizeof(path));
+   snprintf(mp_filter, sizeof(mp_filter), "%s", basename(path));
 
-   path = strdup(eal_mp_socket_path());
-   snprintf(mp_dir_path, PATH_MAX, "%s", dirname(path));
-   free(path);
+   /* path may have been modified, so recreate it */
+   create_socket_path("*", path, sizeof(path));
+   snprintf(mp_dir_path, sizeof(mp_dir_path), "%s", dirname(path));
 
/* lock the directory */
dir_fd = open(mp_dir_path, O_RDONLY);
-- 
2.7.4


[dpdk-dev] [PATCH v3 4/5] eal: prevent secondary process init while sending messages

2018-02-27 Thread Anatoly Burakov
Currently, it is possible to spin up a secondary process while
either sendmsg or request is in progress. Fix this by adding
directory locks during init, sendmsg and requests.

Signed-off-by: Anatoly Burakov 
---

Notes:
v3: no changes

v2: no changes

 lib/librte_eal/common/eal_common_proc.c | 47 -
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index 7c87971..7856a7b 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -507,6 +507,7 @@ rte_mp_channel_init(void)
 {
char thread_name[RTE_MAX_THREAD_NAME_LEN];
char *path;
+   int dir_fd;
pthread_t tid;
 
snprintf(mp_filter, PATH_MAX, ".%s_unix_*",
@@ -516,14 +517,32 @@ rte_mp_channel_init(void)
snprintf(mp_dir_path, PATH_MAX, "%s", dirname(path));
free(path);
 
+   /* lock the directory */
+   dir_fd = open(mp_dir_path, O_RDONLY);
+   if (dir_fd < 0) {
+   RTE_LOG(ERR, EAL, "failed to open %s: %s\n",
+   mp_dir_path, strerror(errno));
+   return -1;
+   }
+
+   if (flock(dir_fd, LOCK_EX)) {
+   RTE_LOG(ERR, EAL, "failed to lock %s: %s\n",
+   mp_dir_path, strerror(errno));
+   close(dir_fd);
+   return -1;
+   }
+
if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
unlink_sockets(mp_filter)) {
RTE_LOG(ERR, EAL, "failed to unlink mp sockets\n");
+   close(dir_fd);
return -1;
}
 
-   if (open_socket_fd() < 0)
+   if (open_socket_fd() < 0) {
+   close(dir_fd);
return -1;
+   }
 
if (pthread_create(&tid, NULL, mp_handle, NULL) < 0) {
RTE_LOG(ERR, EAL, "failed to create mp thead: %s\n",
@@ -536,6 +555,11 @@ rte_mp_channel_init(void)
/* try best to set thread name */
snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, "rte_mp_handle");
rte_thread_setname(tid, thread_name);
+
+   /* unlock the directory */
+   flock(dir_fd, LOCK_UN);
+   close(dir_fd);
+
return 0;
 }
 
@@ -650,6 +674,14 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type)
return -1;
}
dir_fd = dirfd(mp_dir);
+   /* lock the directory to prevent processes spinning up while we send */
+   if (flock(dir_fd, LOCK_EX)) {
+   RTE_LOG(ERR, EAL, "Unable to lock directory %s\n",
+   mp_dir_path);
+   rte_errno = errno;
+   closedir(mp_dir);
+   return -1;
+   }
while ((ent = readdir(mp_dir))) {
char path[PATH_MAX];
const char *peer_name;
@@ -673,6 +705,8 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type)
else if (active > 0 && send_msg(path, msg, type) < 0)
ret = -1;
}
+   /* unlock the dir */
+   flock(dir_fd, LOCK_UN);
 
closedir(mp_dir);
return ret;
@@ -832,6 +866,15 @@ rte_mp_request(struct rte_mp_msg *req, struct rte_mp_reply 
*reply,
}
dir_fd = dirfd(mp_dir);
 
+   /* lock the directory to prevent processes spinning up while we send */
+   if (flock(dir_fd, LOCK_EX)) {
+   RTE_LOG(ERR, EAL, "Unable to lock directory %s\n",
+   mp_dir_path);
+   closedir(mp_dir);
+   rte_errno = errno;
+   return -1;
+   }
+
while ((ent = readdir(mp_dir))) {
const char *peer_name;
char path[PATH_MAX];
@@ -857,6 +900,8 @@ rte_mp_request(struct rte_mp_msg *req, struct rte_mp_reply 
*reply,
if (mp_request_one(path, req, reply, &end))
ret = -1;
}
+   /* unlock the directory */
+   flock(dir_fd, LOCK_UN);
 
closedir(mp_dir);
return ret;
-- 
2.7.4


[dpdk-dev] [PATCH v3 3/5] eal: use locks to determine if secondary process is active

2018-02-27 Thread Anatoly Burakov
Previously, IPC would remove sockets it considers to be "inactive"
based on whether they have responded. Change this to create lock
files in addition to socket files, so that we can determine if
secondary process is active before attempting to communicate with
it. That way, we can distinguish secondaries that are alive but
are not responding, from those that have already died.

Signed-off-by: Anatoly Burakov 
---

Notes:
v3: no changes

v2: no changes

 lib/librte_eal/common/eal_common_proc.c | 204 +++-
 1 file changed, 175 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index a6e24e6..7c87971 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -32,6 +33,7 @@
 #include "eal_internal_cfg.h"
 
 static int mp_fd = -1;
+static int lock_fd = -1;
 static char mp_filter[PATH_MAX];   /* Filter for secondary process sockets */
 static char mp_dir_path[PATH_MAX]; /* The directory path for all mp sockets */
 static pthread_mutex_t mp_mutex_action = PTHREAD_MUTEX_INITIALIZER;
@@ -104,6 +106,46 @@ find_sync_request(const char *dst, const char *act_name)
return r;
 }
 
+static void
+create_socket_path(const char *name, char *buf, int len)
+{
+   const char *prefix = eal_mp_socket_path();
+   if (strlen(name) > 0)
+   snprintf(buf, len, "%s_%s", prefix, name);
+   else
+   snprintf(buf, len, "%s", prefix);
+}
+
+static void
+create_lockfile_path(const char *name, char *buf, int len)
+{
+   const char *prefix = eal_mp_socket_path();
+   if (strlen(name) > 1)
+   snprintf(buf, len, "%slock_%s", prefix, name);
+   else
+   snprintf(buf, len, "%slock", prefix);
+}
+
+static const char *
+get_peer_name(const char *socket_full_path)
+{
+   char buf[PATH_MAX] = {0};
+   int len;
+
+   /* primary process has no peer name */
+   if (strcmp(socket_full_path, eal_mp_socket_path()) == 0)
+   return NULL;
+
+   /* construct dummy socket file name - make it one character long so that
+* we hit the code path where underscores are added
+*/
+   create_socket_path("a", buf, sizeof(buf));
+
+   /* we want to get everything after /path/.rte_unix_, so discard 'a' */
+   len = strlen(buf) - 1;
+   return &socket_full_path[len];
+}
+
 int
 rte_eal_primary_proc_alive(const char *config_file_path)
 {
@@ -332,8 +374,29 @@ mp_handle(void *arg __rte_unused)
 static int
 open_socket_fd(void)
 {
+   char peer_name[PATH_MAX] = {0};
+   char lockfile[PATH_MAX] = {0};
struct sockaddr_un un;
-   const char *prefix = eal_mp_socket_path();
+
+   if (rte_eal_process_type() == RTE_PROC_SECONDARY)
+   snprintf(peer_name, sizeof(peer_name), "%d_%"PRIx64,
+getpid(), rte_rdtsc());
+
+   /* try to create lockfile */
+   create_lockfile_path(peer_name, lockfile, sizeof(lockfile));
+
+   lock_fd = open(lockfile, O_CREAT | O_RDWR);
+   if (lock_fd < 0) {
+   RTE_LOG(ERR, EAL, "failed to open '%s': %s\n", lockfile,
+   strerror(errno));
+   return -1;
+   }
+   if (flock(lock_fd, LOCK_EX | LOCK_NB)) {
+   RTE_LOG(ERR, EAL, "failed to lock '%s': %s\n", lockfile,
+   strerror(errno));
+   return -1;
+   }
+   /* no need to downgrade to shared lock */
 
mp_fd = socket(AF_UNIX, SOCK_DGRAM, 0);
if (mp_fd < 0) {
@@ -343,13 +406,11 @@ open_socket_fd(void)
 
memset(&un, 0, sizeof(un));
un.sun_family = AF_UNIX;
-   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
-   snprintf(un.sun_path, sizeof(un.sun_path), "%s", prefix);
-   else {
-   snprintf(un.sun_path, sizeof(un.sun_path), "%s_%d_%"PRIx64,
-prefix, getpid(), rte_rdtsc());
-   }
+
+   create_socket_path(peer_name, un.sun_path, sizeof(un.sun_path));
+
unlink(un.sun_path); /* May still exist since last run */
+
if (bind(mp_fd, (struct sockaddr *)&un, sizeof(un)) < 0) {
RTE_LOG(ERR, EAL, "failed to bind %s: %s\n",
un.sun_path, strerror(errno));
@@ -361,6 +422,44 @@ open_socket_fd(void)
return mp_fd;
 }
 
+/* find corresponding lock file and try to lock it */
+static int
+socket_is_active(const char *peer_name)
+{
+   char lockfile[PATH_MAX] = {0};
+   int fd, ret = -1;
+
+   /* construct lockfile filename */
+   create_lockfile_path(peer_name, lockfile, sizeof(lockfile));
+
+   /* try to lock it */
+   fd = open(lockfile, O_CREAT | O_RDWR);
+   if (fd < 0) {
+   RTE_LOG(ERR, EAL, "Cannot open '%s': %s\n", lockfile,
+   strerror(

[dpdk-dev] meson support : cross compile issues

2018-02-27 Thread Hemant Agrawal
Hi,

How do we set CROSS COMPILE kernel path support. E.g. something equivalent to 
RTE_KERNELDIR for Makefile

 *   Currently the Igb_uio  compilation fails.
 *   Also, there is no check to disable igb_uio compilation by flag e.g. 
CONFIG_RTE_EAL_IGB_UIO=n



Other minor issue observed is that though the cross compile is set as 6.3, the 
gcc 7 flags (which is my host compiler version) is getting enabled.  Causing 
following errors:

../drivers/bus/dpaa/dpaa_bus.c: At top level:

cc1: warning: unrecognized command line option ‘-Wno-format-truncation’

cc1: warning: unrecognized command line option ‘-Wno-address-of-packed-member’

Regards,
Hemant



[dpdk-dev] [PATCH 1/4] eal: use sizeof to avoid a double use of a define

2018-02-27 Thread Olivier Matz
Only a cosmetic change: the *_LEN defines are already used
when defining the buffer. Using sizeof() ensures that the length
stays consistent, even if the definition is modified.

Signed-off-by: Olivier Matz 
---
 lib/librte_eal/bsdapp/eal/eal.c  | 2 +-
 lib/librte_eal/bsdapp/eal/eal_thread.c   | 2 +-
 lib/librte_eal/linuxapp/eal/eal.c| 4 ++--
 lib/librte_eal/linuxapp/eal/eal_thread.c | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 4eafcb5ad..0b0fb9973 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -632,7 +632,7 @@ rte_eal_init(int argc, char **argv)
 
eal_thread_init_master(rte_config.master_lcore);
 
-   ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
+   ret = eal_thread_dump_affinity(cpuset, sizeof(cpuset));
 
RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%p;cpuset=[%s%s])\n",
rte_config.master_lcore, thread_id, cpuset,
diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index d602daf81..309b58726 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -119,7 +119,7 @@ eal_thread_loop(__attribute__((unused)) void *arg)
if (eal_thread_set_affinity() < 0)
rte_panic("cannot set affinity\n");
 
-   ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
+   ret = eal_thread_dump_affinity(cpuset, sizeof(cpuset));
 
RTE_LOG(DEBUG, EAL, "lcore %u is ready (tid=%p;cpuset=[%s%s])\n",
lcore_id, thread_id, cpuset, ret == 0 ? "" : "...");
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 38306bf5c..1cb87ca25 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -888,7 +888,7 @@ rte_eal_init(int argc, char **argv)
 
eal_thread_init_master(rte_config.master_lcore);
 
-   ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
+   ret = eal_thread_dump_affinity(cpuset, sizeof(cpuset));
 
RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%x;cpuset=[%s%s])\n",
rte_config.master_lcore, (int)thread_id, cpuset,
@@ -919,7 +919,7 @@ rte_eal_init(int argc, char **argv)
rte_panic("Cannot create thread\n");
 
/* Set thread_name for aid in debugging. */
-   snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN,
+   snprintf(thread_name, sizeof(thread_name),
"lcore-slave-%d", i);
ret = rte_thread_setname(lcore_config[i].thread_id,
thread_name);
diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c 
b/lib/librte_eal/linuxapp/eal/eal_thread.c
index 08e150b77..f652ff988 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -119,7 +119,7 @@ eal_thread_loop(__attribute__((unused)) void *arg)
if (eal_thread_set_affinity() < 0)
rte_panic("cannot set affinity\n");
 
-   ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
+   ret = eal_thread_dump_affinity(cpuset, sizeof(cpuset));
 
RTE_LOG(DEBUG, EAL, "lcore %u is ready (tid=%x;cpuset=[%s%s])\n",
lcore_id, (int)thread_id, cpuset, ret == 0 ? "" : "...");
-- 
2.11.0



[dpdk-dev] [PATCH 0/4] fix control thread affinities

2018-02-27 Thread Olivier Matz
Some parts of dpdk use their own management threads. Most of the time,
the affinity of the thread is not properly set: it should not be scheduled
on the dataplane cores, because interrupting them can cause packet losses.

This patchset introduces a new wrapper for thread creation that does
the job automatically, avoiding code duplication.

Olivier Matz (4):
  eal: use sizeof to avoid a double use of a define
  eal: new function to create control threads
  eal: set name when creating a control thread
  eal: set affinity for control threads

 drivers/net/kni/Makefile   |  1 +
 drivers/net/kni/rte_eth_kni.c  |  3 +-
 lib/librte_eal/bsdapp/eal/eal.c|  2 +-
 lib/librte_eal/bsdapp/eal/eal_thread.c |  2 +-
 lib/librte_eal/common/eal_common_thread.c  | 70 ++
 lib/librte_eal/common/include/rte_lcore.h  | 26 ++
 lib/librte_eal/linuxapp/eal/eal.c  |  4 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 17 ++-
 lib/librte_eal/linuxapp/eal/eal_thread.c   |  2 +-
 lib/librte_eal/linuxapp/eal/eal_timer.c| 12 +
 lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c | 10 +---
 lib/librte_eal/rte_eal_version.map |  1 +
 lib/librte_pdump/Makefile  |  1 +
 lib/librte_pdump/rte_pdump.c   | 13 ++---
 lib/librte_vhost/socket.c  |  7 +--
 15 files changed, 123 insertions(+), 48 deletions(-)

-- 
2.11.0



[dpdk-dev] [PATCH 2/4] eal: new function to create control threads

2018-02-27 Thread Olivier Matz
Many parts of dpdk use their own management threads. Introduce a new
wrapper for thread creation that will be extended in next commits to set
the name and affinity.

To be consistent with other DPDK APIs, the return value is negative in
case of error, which was not the case for pthread_create().

Signed-off-by: Olivier Matz 
---
 drivers/net/kni/Makefile   |  1 +
 drivers/net/kni/rte_eth_kni.c  |  2 +-
 lib/librte_eal/common/eal_common_thread.c  |  8 
 lib/librte_eal/common/include/rte_lcore.h  | 21 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   |  6 +++---
 lib/librte_eal/linuxapp/eal/eal_timer.c|  2 +-
 lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c |  2 +-
 lib/librte_eal/rte_eal_version.map |  1 +
 lib/librte_pdump/Makefile  |  1 +
 lib/librte_pdump/rte_pdump.c   |  5 +++--
 lib/librte_vhost/socket.c  |  6 +++---
 11 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/drivers/net/kni/Makefile b/drivers/net/kni/Makefile
index 01eaef056..562e8d2da 100644
--- a/drivers/net/kni/Makefile
+++ b/drivers/net/kni/Makefile
@@ -10,6 +10,7 @@ LIB = librte_pmd_kni.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lpthread
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_kni
diff --git a/drivers/net/kni/rte_eth_kni.c b/drivers/net/kni/rte_eth_kni.c
index dc4e65f5d..26718eb3e 100644
--- a/drivers/net/kni/rte_eth_kni.c
+++ b/drivers/net/kni/rte_eth_kni.c
@@ -149,7 +149,7 @@ eth_kni_dev_start(struct rte_eth_dev *dev)
}
 
if (internals->no_request_thread == 0) {
-   ret = pthread_create(&internals->thread, NULL,
+   ret = rte_ctrl_thread_create(&internals->thread, NULL,
kni_handle_request, internals);
if (ret) {
RTE_LOG(ERR, PMD,
diff --git a/lib/librte_eal/common/eal_common_thread.c 
b/lib/librte_eal/common/eal_common_thread.c
index 40902e49b..efbccddbc 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -140,3 +140,11 @@ eal_thread_dump_affinity(char *str, unsigned size)
 
return ret;
 }
+
+__rte_experimental int
+rte_ctrl_thread_create(pthread_t *thread,
+   const pthread_attr_t *attr,
+   void *(*start_routine)(void *), void *arg)
+{
+   return pthread_create(thread, attr, start_routine, arg);
+}
diff --git a/lib/librte_eal/common/include/rte_lcore.h 
b/lib/librte_eal/common/include/rte_lcore.h
index 047222030..f19075a88 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -247,6 +247,27 @@ void rte_thread_get_affinity(rte_cpuset_t *cpusetp);
 int rte_thread_setname(pthread_t id, const char *name);
 
 /**
+ * Create a control thread.
+ *
+ * Wrapper to pthread_create().
+ *
+ * @param thread
+ *   Filled with the thread id of the new created thread.
+ * @param attr
+ *   Attributes for the new thread.
+ * @param start_routine
+ *   Function to be executed by the new thread.
+ * @param arg
+ *   Argument passed to start_routine.
+ * @return
+ *   On success, returns 0; on error, it returns a negative value
+ *   corresponding to the error number.
+ */
+__rte_experimental int
+rte_ctrl_thread_create(pthread_t *thread, const pthread_attr_t *attr,
+   void *(*start_routine)(void *), void *arg);
+
+/**
  * Test if the core supplied has a specific role
  *
  * @param lcore_id
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index f86f22f7b..d927fb45d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -860,10 +860,10 @@ rte_eal_intr_init(void)
}
 
/* create the host thread to wait/handle the interrupt */
-   ret = pthread_create(&intr_thread, NULL,
+   ret = rte_ctrl_thread_create(&intr_thread, NULL,
eal_intr_thread_main, NULL);
if (ret != 0) {
-   rte_errno = ret;
+   rte_errno = -ret;
RTE_LOG(ERR, EAL,
"Failed to create thread for interrupt handling\n");
} else {
@@ -876,7 +876,7 @@ rte_eal_intr_init(void)
"Failed to set thread name for interrupt handling\n");
}
 
-   return -ret;
+   return ret;
 }
 
 static void
diff --git a/lib/librte_eal/linuxapp/eal/eal_timer.c 
b/lib/librte_eal/linuxapp/eal/eal_timer.c
index 161322f23..f12d2e134 100644
--- a/lib/librte_eal/linuxapp/eal/eal_timer.c
+++ b/lib/librte_eal/linuxapp/eal/eal_timer.c
@@ -178,7 +178,7 @@ rte_eal_hpet_init(int make_default)
 
/* create a thread that will increment a global variable for
 * msb (hpet is 32 bits by default under linu

[dpdk-dev] [PATCH 3/4] eal: set name when creating a control thread

2018-02-27 Thread Olivier Matz
To avoid code duplication, add a parameter to rte_ctrl_thread_create()
to specify the name of the thread.

This requires to add a wrapper for the thread start routine in
rte_thread_init(), which will first wait that the thread is configured.

Signed-off-by: Olivier Matz 
---
 drivers/net/kni/rte_eth_kni.c  |  3 +-
 lib/librte_eal/common/eal_common_thread.c  | 52 --
 lib/librte_eal/common/include/rte_lcore.h  |  7 +++-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 13 ++-
 lib/librte_eal/linuxapp/eal/eal_timer.c| 12 +-
 lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c | 10 +
 lib/librte_pdump/rte_pdump.c   | 10 +
 lib/librte_vhost/socket.c  |  7 ++--
 8 files changed, 68 insertions(+), 46 deletions(-)

diff --git a/drivers/net/kni/rte_eth_kni.c b/drivers/net/kni/rte_eth_kni.c
index 26718eb3e..6b036d8e1 100644
--- a/drivers/net/kni/rte_eth_kni.c
+++ b/drivers/net/kni/rte_eth_kni.c
@@ -149,7 +149,8 @@ eth_kni_dev_start(struct rte_eth_dev *dev)
}
 
if (internals->no_request_thread == 0) {
-   ret = rte_ctrl_thread_create(&internals->thread, NULL,
+   ret = rte_ctrl_thread_create(&internals->thread,
+   "kni_handle_request", NULL,
kni_handle_request, internals);
if (ret) {
RTE_LOG(ERR, PMD,
diff --git a/lib/librte_eal/common/eal_common_thread.c 
b/lib/librte_eal/common/eal_common_thread.c
index efbccddbc..575b03e9d 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -141,10 +142,53 @@ eal_thread_dump_affinity(char *str, unsigned size)
return ret;
 }
 
+
+struct rte_thread_ctrl_params {
+   void *(*start_routine)(void *);
+   void *arg;
+   pthread_barrier_t configured;
+};
+
+static void *rte_thread_init(void *arg)
+{
+   struct rte_thread_ctrl_params *params = arg;
+   void *(*start_routine)(void *) = params->start_routine;
+   void *routine_arg = params->arg;
+
+   pthread_barrier_wait(¶ms->configured);
+
+   return start_routine(routine_arg);
+}
+
 __rte_experimental int
-rte_ctrl_thread_create(pthread_t *thread,
-   const pthread_attr_t *attr,
-   void *(*start_routine)(void *), void *arg)
+rte_ctrl_thread_create(pthread_t *thread, const char *name,
+   const pthread_attr_t *attr,
+   void *(*start_routine)(void *), void *arg)
 {
-   return pthread_create(thread, attr, start_routine, arg);
+   struct rte_thread_ctrl_params params = {
+   .start_routine = start_routine,
+   .arg = arg,
+   };
+   int ret;
+
+   pthread_barrier_init(¶ms.configured, NULL, 2);
+
+   ret = pthread_create(thread, attr, rte_thread_init, (void *)¶ms);
+   if (ret != 0)
+   return ret;
+
+   if (name != NULL) {
+   ret = rte_thread_setname(*thread, name);
+   if (ret < 0)
+   goto fail;
+   }
+
+   pthread_barrier_wait(¶ms.configured);
+
+   return 0;
+
+fail:
+   pthread_kill(*thread, SIGTERM);
+   pthread_join(*thread, NULL);
+   return ret;
 }
diff --git a/lib/librte_eal/common/include/rte_lcore.h 
b/lib/librte_eal/common/include/rte_lcore.h
index f19075a88..f3d9bbb91 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -249,10 +249,12 @@ int rte_thread_setname(pthread_t id, const char *name);
 /**
  * Create a control thread.
  *
- * Wrapper to pthread_create().
+ * Wrapper to pthread_create() and pthread_setname_np().
  *
  * @param thread
  *   Filled with the thread id of the new created thread.
+ * @param name
+ *   The name of the control thread (max 16 characters including '\0').
  * @param attr
  *   Attributes for the new thread.
  * @param start_routine
@@ -264,7 +266,8 @@ int rte_thread_setname(pthread_t id, const char *name);
  *   corresponding to the error number.
  */
 __rte_experimental int
-rte_ctrl_thread_create(pthread_t *thread, const pthread_attr_t *attr,
+rte_ctrl_thread_create(pthread_t *thread, const char *name,
+   const pthread_attr_t *attr,
void *(*start_routine)(void *), void *arg);
 
 /**
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index d927fb45d..3f184bed3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -844,7 +844,7 @@ eal_intr_thread_main(__rte_unused void *arg)
 int
 rte_eal_intr_init(void)
 {
-   int ret = 0, ret_1 = 0;
+   int ret = 0;
char thread_name[RTE_MAX_THREAD_NAME_LEN];
 
/* init the global interrupt source head */
@@ -860,20 +860,13 @@ rte_eal_intr_in

[dpdk-dev] [PATCH 4/4] eal: set affinity for control threads

2018-02-27 Thread Olivier Matz
The management threads must not bother the dataplane or service cores.
Set the affinity of these threads accordingly.

Signed-off-by: Olivier Matz 
---
 lib/librte_eal/common/eal_common_thread.c | 20 +++-
 lib/librte_eal/common/include/rte_lcore.h |  4 +++-
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c 
b/lib/librte_eal/common/eal_common_thread.c
index 575b03e9d..f2e588c97 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 
+#include "eal_private.h"
 #include "eal_thread.h"
 
 RTE_DECLARE_PER_LCORE(unsigned , _socket_id);
@@ -169,7 +170,9 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
.start_routine = start_routine,
.arg = arg,
};
-   int ret;
+   unsigned int lcore_id;
+   rte_cpuset_t cpuset;
+   int set_affinity, ret;
 
pthread_barrier_init(¶ms.configured, NULL, 2);
 
@@ -183,6 +186,21 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
goto fail;
}
 
+   set_affinity = 0;
+   CPU_ZERO(&cpuset);
+   for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+   if (eal_cpu_detected(lcore_id) &&
+   rte_lcore_has_role(lcore_id, ROLE_OFF)) {
+   CPU_SET(lcore_id, &cpuset);
+   set_affinity = 1;
+   }
+   }
+   if (set_affinity) {
+   ret = pthread_setaffinity_np(*thread, sizeof(cpuset), &cpuset);
+   if (ret < 0)
+   goto fail;
+   }
+
pthread_barrier_wait(¶ms.configured);
 
return 0;
diff --git a/lib/librte_eal/common/include/rte_lcore.h 
b/lib/librte_eal/common/include/rte_lcore.h
index f3d9bbb91..354717c5d 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -249,7 +249,9 @@ int rte_thread_setname(pthread_t id, const char *name);
 /**
  * Create a control thread.
  *
- * Wrapper to pthread_create() and pthread_setname_np().
+ * Wrapper to pthread_create(), pthread_setname_np() and
+ * pthread_setaffinity_np(). The dataplane and service lcores are
+ * excluded from the affinity of the new thread.
  *
  * @param thread
  *   Filled with the thread id of the new created thread.
-- 
2.11.0



Re: [dpdk-dev] meson support : cross compile issues

2018-02-27 Thread Bruce Richardson
On Tue, Feb 27, 2018 at 02:38:53PM +, Hemant Agrawal wrote:
> Hi,
> 
> How do we set CROSS COMPILE kernel path support. E.g. something equivalent to 
> RTE_KERNELDIR for Makefile
> 
>  *   Currently the Igb_uio  compilation fails.
>  *   Also, there is no check to disable igb_uio compilation by flag e.g. 
> CONFIG_RTE_EAL_IGB_UIO=n
> 
> 

I have not had time to look at the cross-compilation of kernel modules
yet, so patches welcome. :-)
However, it should be possible to disable the kernel modules generally
using "enable_kmods" option (see meson_options.txt)

> 
> Other minor issue observed is that though the cross compile is set as 6.3, 
> the gcc 7 flags (which is my host compiler version) is getting enabled.  
> Causing following errors:
> 

Actually, this is a gcc quirk. GCC does not report an error for
cmdline flags disabling unknown warnings, unless other errors are
displayed. This means that when meson tries the options, GCC reports as
supporting them. It also means that it's not a problem when you do see
them - just fix the other errors and the warnings about the flags will
disappear.

See: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
"However, if the -Wno- form is used, the behavior is slightly different:
no diagnostic is produced for -Wno-unknown-warning unless other
diagnostics are being produced. This allows the use of new -Wno- options
with old compilers, but if something goes wrong, the compiler warns that
an unrecognized option is present"


> ../drivers/bus/dpaa/dpaa_bus.c: At top level:
> 
> cc1: warning: unrecognized command line option ‘-Wno-format-truncation’
> 
> cc1: warning: unrecognized command line option ‘-Wno-address-of-packed-member’
> 
> Regards,
> Hemant
> 
Regards,
/Bruce


[dpdk-dev] [PATCH] eal: add asynchronous request API to DPDK IPC

2018-02-27 Thread Anatoly Burakov
This API is similar to the blocking API that is already present,
but reply will be received in a separate callback by the caller.

Under the hood, we create a separate thread to deal with replies to
asynchronous requests, that will just wait to be notified by the
main thread, or woken up on a timer (it'll wake itself up every
minute regardless of whether it was called, but if there are no
requests in the queue, nothing will be done and it'll go to sleep
for another minute).

Signed-off-by: Anatoly Burakov 
---

Notes:
This patch is dependent upon previously published patchsets
for IPC fixes [1] and improvements [2].

rte_mp_action_unregister and rte_mp_async_reply_unregister
do the same thing - should we perhaps make it one function?

[1] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Fixes/
[2] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Improvements/

 lib/librte_eal/common/eal_common_proc.c | 528 +---
 lib/librte_eal/common/include/rte_eal.h |  71 +
 2 files changed, 564 insertions(+), 35 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index bdea6d6..c5ae569 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -41,7 +41,11 @@ static pthread_mutex_t mp_mutex_action = 
PTHREAD_MUTEX_INITIALIZER;
 struct action_entry {
TAILQ_ENTRY(action_entry) next;
char action_name[RTE_MP_MAX_NAME_LEN];
-   rte_mp_t action;
+   RTE_STD_C11
+   union {
+   rte_mp_t action;
+   rte_mp_async_reply_t reply;
+   };
 };
 
 /** Double linked list of actions. */
@@ -73,13 +77,37 @@ TAILQ_HEAD(message_queue, message_queue_entry);
 static struct message_queue message_queue =
TAILQ_HEAD_INITIALIZER(message_queue);
 
+enum mp_request_type {
+   REQUEST_TYPE_SYNC,
+   REQUEST_TYPE_ASYNC
+};
+
+struct async_request_shared_param {
+   struct rte_mp_reply *user_reply;
+   struct timespec *end;
+   int n_requests_processed;
+};
+
+struct async_request_param {
+   struct async_request_shared_param *param;
+};
+
+struct sync_request_param {
+   pthread_cond_t cond;
+};
+
 struct sync_request {
TAILQ_ENTRY(sync_request) next;
-   int reply_received;
+   enum mp_request_type type;
char dst[PATH_MAX];
struct rte_mp_msg *request;
-   struct rte_mp_msg *reply;
-   pthread_cond_t cond;
+   struct rte_mp_msg *reply_msg;
+   int reply_received;
+   RTE_STD_C11
+   union {
+   struct sync_request_param sync;
+   struct async_request_param async;
+   };
 };
 
 TAILQ_HEAD(sync_request_list, sync_request);
@@ -87,9 +115,12 @@ TAILQ_HEAD(sync_request_list, sync_request);
 static struct {
struct sync_request_list requests;
pthread_mutex_t lock;
+   pthread_cond_t async_cond;
 } sync_requests = {
.requests = TAILQ_HEAD_INITIALIZER(sync_requests.requests),
-   .lock = PTHREAD_MUTEX_INITIALIZER
+   .lock = PTHREAD_MUTEX_INITIALIZER,
+   .async_cond = PTHREAD_COND_INITIALIZER
+   /**< used in async requests only */
 };
 
 static struct sync_request *
@@ -201,53 +232,97 @@ validate_action_name(const char *name)
return 0;
 }
 
-int __rte_experimental
-rte_mp_action_register(const char *name, rte_mp_t action)
+static struct action_entry *
+action_register(const char *name)
 {
struct action_entry *entry;
 
if (validate_action_name(name))
-   return -1;
+   return NULL;
 
entry = malloc(sizeof(struct action_entry));
if (entry == NULL) {
rte_errno = ENOMEM;
-   return -1;
+   return NULL;
}
strcpy(entry->action_name, name);
-   entry->action = action;
 
-   pthread_mutex_lock(&mp_mutex_action);
if (find_action_entry_by_name(name) != NULL) {
pthread_mutex_unlock(&mp_mutex_action);
rte_errno = EEXIST;
free(entry);
-   return -1;
+   return NULL;
}
TAILQ_INSERT_TAIL(&action_entry_list, entry, next);
-   pthread_mutex_unlock(&mp_mutex_action);
-   return 0;
+
+   /* async and sync replies are handled by different threads, so even
+* though they a share pointer in a union, one will never trigger in
+* place of the other.
+*/
+
+   return entry;
 }
 
-void __rte_experimental
-rte_mp_action_unregister(const char *name)
+static void
+action_unregister(const char *name)
 {
struct action_entry *entry;
 
if (validate_action_name(name))
return;
 
-   pthread_mutex_lock(&mp_mutex_action);
entry = find_action_entry_by_name(name);
if (entry == NULL) {
-   pthread_mutex_unlock(&mp_mutex_action);
return;
}
TAILQ_REMOVE(&action_ent

[dpdk-dev] [PATCH] ethdev: return diagnostic when setting MAC address

2018-02-27 Thread Olivier Matz
Change the prototype and the behavior of dev_ops->eth_mac_addr_set(): a
return code is added to notify the caller (librte_ether) if an error
occurred in the PMD.

The new default MAC address is now copied in dev->data->mac_addrs[0]
only if the operation is successful.

The patch also updates all the PMDs accordingly.

Signed-off-by: Olivier Matz 
---

Hi,

This patch is the following of the discussion we had in this thread:
https://dpdk.org/dev/patchwork/patch/32284/

I did my best to keep the consistency inside the PMDs. The behavior
of eth_mac_addr_set() is inspired from other fonctions in the same
PMD, usually eth_mac_addr_add(). For instance:
- dpaa and dpaa2 return 0 on error.
- some PMDs (bnxt, mlx5, ...?) do not return a -errno code (-1 or
  positive values).
- some PMDs (avf, tap) check if the address is the same and return 0
  in that case. This could go in generic code?

I tried to use the following errors when relevant:
- -EPERM when a VF is not allowed to do a change
- -ENOTSUP if the function is not supported
- -EIO if this is an unknown error from lower layer (hw or sdk)
- -EINVAL for other unknown errors

Please, PMD maintainers, feel free to comment if you ahve specific
needs for your driver.

Thanks
Olivier


 doc/guides/rel_notes/deprecation.rst|  8 
 drivers/net/ark/ark_ethdev.c|  9 ++---
 drivers/net/avf/avf_ethdev.c| 12 
 drivers/net/bnxt/bnxt_ethdev.c  | 10 ++
 drivers/net/bonding/rte_eth_bond_pmd.c  |  8 ++--
 drivers/net/dpaa/dpaa_ethdev.c  |  4 +++-
 drivers/net/dpaa2/dpaa2_ethdev.c|  6 --
 drivers/net/e1000/igb_ethdev.c  | 12 +++-
 drivers/net/failsafe/failsafe_ops.c | 16 +---
 drivers/net/i40e/i40e_ethdev.c  | 24 ++-
 drivers/net/i40e/i40e_ethdev_vf.c   | 12 +++-
 drivers/net/ixgbe/ixgbe_ethdev.c| 13 -
 drivers/net/mlx4/mlx4.h |  2 +-
 drivers/net/mlx4/mlx4_ethdev.c  |  7 +--
 drivers/net/mlx5/mlx5.h |  2 +-
 drivers/net/mlx5/mlx5_mac.c |  7 +--
 drivers/net/mrvl/mrvl_ethdev.c  |  7 ++-
 drivers/net/null/rte_eth_null.c |  3 ++-
 drivers/net/octeontx/octeontx_ethdev.c  |  4 +++-
 drivers/net/qede/qede_ethdev.c  |  7 +++
 drivers/net/sfc/sfc_ethdev.c| 14 +-
 drivers/net/szedata2/rte_eth_szedata2.c |  3 ++-
 drivers/net/tap/rte_eth_tap.c   | 34 +
 drivers/net/virtio/virtio_ethdev.c  | 15 ++-
 drivers/net/vmxnet3/vmxnet3_ethdev.c|  5 +++--
 lib/librte_ether/rte_ethdev.c   |  7 +--
 lib/librte_ether/rte_ethdev_core.h  |  2 +-
 test/test/virtual_pmd.c |  3 ++-
 28 files changed, 159 insertions(+), 97 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 74c18ed7c..2bf360f0d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -134,14 +134,6 @@ Deprecation Notices
   between the VF representor and the VF or the parent PF. Those new fields
   are to be included in ``rte_eth_dev_info`` struct.
 
-* ethdev: The prototype and the behavior of
-  ``dev_ops->eth_mac_addr_set()`` will change in v18.05. A return code
-  will be added to notify the caller if an error occurred in the PMD. In
-  ``rte_eth_dev_default_mac_addr_set()``, the new default MAC address
-  will be copied in ``dev->data->mac_addrs[0]`` only if the operation is
-  successful. This modification will only impact the PMDs, not the
-  applications.
-
 * ethdev: functions add rx/tx callback will return named opaque type
   ``rte_eth_add_rx_callback()``, ``rte_eth_add_first_rx_callback()`` and
   ``rte_eth_add_tx_callback()`` functions currently return callback object as
diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
index ff87c20e2..3fc40cd74 100644
--- a/drivers/net/ark/ark_ethdev.c
+++ b/drivers/net/ark/ark_ethdev.c
@@ -69,7 +69,7 @@ static int eth_ark_dev_set_link_down(struct rte_eth_dev *dev);
 static int eth_ark_dev_stats_get(struct rte_eth_dev *dev,
  struct rte_eth_stats *stats);
 static void eth_ark_dev_stats_reset(struct rte_eth_dev *dev);
-static void eth_ark_set_default_mac_addr(struct rte_eth_dev *dev,
+static int eth_ark_set_default_mac_addr(struct rte_eth_dev *dev,
 struct ether_addr *mac_addr);
 static int eth_ark_macaddr_add(struct rte_eth_dev *dev,
   struct ether_addr *mac_addr,
@@ -887,16 +887,19 @@ eth_ark_macaddr_remove(struct rte_eth_dev *dev, uint32_t 
index)
  ark->user_data[dev->data->port_id]);
 }
 
-static void
+static int
 eth_ark_set_default_mac_addr(struct rte_eth_dev *dev,
 struct ether_addr *mac_addr)
 {
struct ark_adapter *ark =

[dpdk-dev] [PATCH] eal/ppc: fix rte_smp_mb for a compilation error with else clause

2018-02-27 Thread Gowrishankar
From: Gowrishankar Muthukrishnan 

This patch fixes the compilation problem with rte_smp_mb,
when there is else clause following it, as in test_barrier.c.

Fixes: 05c3fd7110 ("eal/ppc: atomic operations for IBM Power")
Cc: sta...@dpdk.org

Signed-off-by: Gowrishankar Muthukrishnan 
---
 lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h 
b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
index 39fce7b..1821774 100644
--- a/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
+++ b/lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h
@@ -55,7 +55,7 @@
  * Guarantees that the LOAD and STORE operations generated before the
  * barrier occur before the LOAD and STORE operations generated after.
  */
-#definerte_mb()  {asm volatile("sync" : : : "memory"); }
+#definerte_mb()  asm volatile("sync" : : : "memory")
 
 /**
  * Write memory barrier.
-- 
1.9.1



Re: [dpdk-dev] [PATCH] maintainers: resign from GSO lib maintenance

2018-02-27 Thread Zhang, Helin


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Mark Kavanagh
> Sent: Tuesday, February 27, 2018 7:01 PM
> To: dev@dpdk.org
> Cc: Hu, Jiayu; Kavanagh, Mark B
> Subject: [dpdk-dev] [PATCH] maintainers: resign from GSO lib maintenance
> 
> I will not be directly working on the DPDK project anymore.
> 
> Signed-off-by: Mark Kavanagh 
Acked-by: Helin Zhang 



Re: [dpdk-dev] [PATCH 03/18] ethdev: introduce new tunnel VXLAN-GPE

2018-02-27 Thread Mohammad Abdul Awal


On 26/02/2018 15:09, Xueming Li wrote:

diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
index 45daa91..fe02ad8 100644
--- a/lib/librte_net/rte_ether.h
+++ b/lib/librte_net/rte_ether.h
@@ -310,6 +310,31 @@ struct vxlan_hdr {
  /**< VXLAN tunnel header length. */
  
  /**

+ * VXLAN-GPE protocol header.
+ * Contains the 8-bit flag, 8-bit next-protocol, 24-bit VXLAN Network
+ * Identifier and Reserved fields (16 bits and 8 bits).
+ */
+struct vxlan_gpe_hdr {
+   uint8_t vx_flags; /**< flag (8). */
+   uint8_t reserved[2]; /**< Reserved (16). */
+   uint8_t proto; /**< next-protocol (8). */
+   uint32_t vx_vni;   /**< VNI (24) + Reserved (8). */
+} __attribute__((__packed__));
+
+/* VXLAN-GPE next protocol types */
+#define VXLAN_GPE_TYPE_IPv4 1 /**< IPv4 Protocol. */
+#define VXLAN_GPE_TYPE_IPv6 2 /**< IPv6 Protocol. */
+#define VXLAN_GPE_TYPE_ETH  3 /**< Ethernet Protocol. */
+#define VXLAN_GPE_TYPE_NSH  4 /**< NSH Protocol. */
+#define VXLAN_GPE_TYPE_MPLS 5 /**< MPLS Protocol. */
+#define VXLAN_GPE_TYPE_GBP  6 /**< GBP Protocol. */
+#define VXLAN_GPE_TYPE_VBNG 7 /**< vBNG Protocol. */
+
+#define ETHER_VXLAN_GPE_HLEN (sizeof(struct udp_hdr) + \
+ sizeof(struct vxlan_gpe_hdr))
+/**< VXLAN-GPE tunnel header length. */
+
+/**
   * Extract VLAN tag information into mbuf
   *
   * Software version of VLAN stripping
Should we define the VXLAN-GPE protocol and related macros in a separate 
file (say lib/librte_net/rte_vxlan_gpe.h)?
I can see that VXLAN header also defined in the rte_ether.h file but we 
should consider moving that VXLAN definition in a separate header file 
(rte_vxlan.h) as well.


Regards,
Awal.


[dpdk-dev] [PATCH] event/sw: code refractor for counter set

2018-02-27 Thread Vipin Varghese
Counter variable 'out_pkts' had been set to 0, then updated. Current
code change elimates double assignment to direct assignment.

Signed-off-by: Vipin Varghese 
---
 drivers/event/sw/sw_evdev_scheduler.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/event/sw/sw_evdev_scheduler.c 
b/drivers/event/sw/sw_evdev_scheduler.c
index 9143b93..e3a41e0 100644
--- a/drivers/event/sw/sw_evdev_scheduler.c
+++ b/drivers/event/sw/sw_evdev_scheduler.c
@@ -532,8 +532,7 @@ sw_event_schedule(struct rte_eventdev *dev)
} while (in_pkts > 4 &&
(int)in_pkts_this_iteration < sched_quanta);
 
-   out_pkts = 0;
-   out_pkts += sw_schedule_qid_to_cq(sw);
+   out_pkts = sw_schedule_qid_to_cq(sw);
out_pkts_total += out_pkts;
in_pkts_total += in_pkts_this_iteration;
 
-- 
2.7.4



[dpdk-dev] tunnel endpoint hw acceleration enablement

2018-02-27 Thread Doherty, Declan
Invite for a DPDK community call to discuss the tunnel endpoint hw acceleration 
proposal in this RFC 
(http://dpdk.org/ml/archives/dev/2017-December/084676.html) and the related 
community feedback.

Proposed agenda:

- Summary of RFC proposal, treating TEPs as standalone entities which flows get 
added to/remove form.

- Community feedback, managing TEPs purely within the scope of rte_flow as a 
property of individual flows.

- Pro's/con's of each approach.

- Downstream users feedback/thought's, I'm hoping to get some participation 
from the OvS-DPDK community, as our proposal was shaped with the view of 
enabling tunnel-endpoint encap/decap with full vswitch offload.

- next steps.

Regards
Declan


.
--> Join Skype Meeting
  Trouble Joining? Try Skype Web 
App
Join by phone
+1(916)356-2663 (or your local bridge access #) Choose bridge 
5.
 (Global)   English (United States)
Find a local number

Conference ID: 150042341
 Forgot your dial-in PIN? 
|Help

[!OC([1033])!]
.




[dpdk-dev] Issue building for ppc64le

2018-02-27 Thread Marco Varlese
Hi,

Is anybody else experiencing issues with building DPDK 17.11 for ppc64le?
Any help would be very much appreciated.

I get the below error:

== START ==
[  326s] gcc -Wp,-MD,./.power_manager.o.d.tmp  -m64 -pthread -fPIC   -
DRTE_MACHINE_CPUFLAG_PPC64 -DRTE_MACHINE_CPUFLAG_ALTIVEC
-DRTE_MACHINE_CPUFLAG_VSX  -I/home/abuild/rpmbuild/BUILD/dpdk-17.11/ppc_64-
power8-linuxapp-gcc/examples/vm_power_manager/ppc_64-power8-linuxapp-gcc/include
 
-I/home/abuild/rpmbuild/BUILD/dpdk-17.11/ppc_64-power8-linuxapp-gcc/include
-include /home/abuild/rpmbuild/BUILD/dpdk-17.11/ppc_64-power8-linuxapp-
gcc/include/rte_config.h -O3 -I/home/abuild/rpmbuild/BUILD/dpdk-
17.11/lib/librte_power/ -W -Wall -Wstrict-prototypes -Wmissing-prototypes
-Wmissing-declarations -Wold-style-definition -Wpointer-arith -Wcast-align
-Wnested-externs -Wcast-qual -Wformat-nonliteral -Wformat-security -Wundef
-Wwrite-strings -Wimplicit-fallthrough=2 -Wno-format-truncation   -
DVERSION="17.11" -o power_manager.o -c /home/abuild/rpmbuild/BUILD/dpdk-
17.11/examples/vm_power_manager/power_manager.c 
[  327s] /home/abuild/rpmbuild/BUILD/dpdk-
17.11/examples/vm_power_manager/main.c:61:10: fatal error: rte_pmd_ixgbe.h: No
such file or directory
[  327s]  #include 
[  327s]   ^
[  327s] compilation terminated.
[  327s] make[4]: *** [/home/abuild/rpmbuild/BUILD/dpdk-
17.11/mk/internal/rte.compile-pre.mk:140: main.o] Error 1
[  327s] make[4]: *** Waiting for unfinished jobs
== END ==

Cheers,
-- 
Marco V

SUSE LINUX GmbH | GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg) Maxfeldstr. 5, D-90409, Nürnberg


[dpdk-dev] [PATCH 1/4] net/ixgbe: support VLAN strip per queue offloading in PF

2018-02-27 Thread Wei Dai
VLAN strip is a per queue offloading in PF. With this patch
it can be enabled or disabled on any Rx queue in PF.

Signed-off-by: Wei Dai 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 109 +--
 drivers/net/ixgbe/ixgbe_ethdev.h |   4 +-
 drivers/net/ixgbe/ixgbe_pf.c |   5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   1 +
 drivers/net/ixgbe/ixgbe_rxtx.h   |   1 +
 5 files changed, 51 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4483258..73755d2 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -2001,64 +2001,6 @@ ixgbe_vlan_hw_strip_enable(struct rte_eth_dev *dev, 
uint16_t queue)
ixgbe_vlan_hw_strip_bitmap_set(dev, queue, 1);
 }
 
-void
-ixgbe_vlan_hw_strip_disable_all(struct rte_eth_dev *dev)
-{
-   struct ixgbe_hw *hw =
-   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   uint32_t ctrl;
-   uint16_t i;
-   struct ixgbe_rx_queue *rxq;
-
-   PMD_INIT_FUNC_TRACE();
-
-   if (hw->mac.type == ixgbe_mac_82598EB) {
-   ctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
-   ctrl &= ~IXGBE_VLNCTRL_VME;
-   IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, ctrl);
-   } else {
-   /* Other 10G NIC, the VLAN strip can be setup per queue in 
RXDCTL */
-   for (i = 0; i < dev->data->nb_rx_queues; i++) {
-   rxq = dev->data->rx_queues[i];
-   ctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(rxq->reg_idx));
-   ctrl &= ~IXGBE_RXDCTL_VME;
-   IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(rxq->reg_idx), ctrl);
-
-   /* record those setting for HW strip per queue */
-   ixgbe_vlan_hw_strip_bitmap_set(dev, i, 0);
-   }
-   }
-}
-
-void
-ixgbe_vlan_hw_strip_enable_all(struct rte_eth_dev *dev)
-{
-   struct ixgbe_hw *hw =
-   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   uint32_t ctrl;
-   uint16_t i;
-   struct ixgbe_rx_queue *rxq;
-
-   PMD_INIT_FUNC_TRACE();
-
-   if (hw->mac.type == ixgbe_mac_82598EB) {
-   ctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
-   ctrl |= IXGBE_VLNCTRL_VME;
-   IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, ctrl);
-   } else {
-   /* Other 10G NIC, the VLAN strip can be setup per queue in 
RXDCTL */
-   for (i = 0; i < dev->data->nb_rx_queues; i++) {
-   rxq = dev->data->rx_queues[i];
-   ctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(rxq->reg_idx));
-   ctrl |= IXGBE_RXDCTL_VME;
-   IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(rxq->reg_idx), ctrl);
-
-   /* record those setting for HW strip per queue */
-   ixgbe_vlan_hw_strip_bitmap_set(dev, i, 1);
-   }
-   }
-}
-
 static void
 ixgbe_vlan_hw_extend_disable(struct rte_eth_dev *dev)
 {
@@ -2114,14 +2056,57 @@ ixgbe_vlan_hw_extend_enable(struct rte_eth_dev *dev)
 */
 }
 
+void
+ixgbe_vlan_hw_strip_config(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw =
+   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+   uint32_t ctrl;
+   uint16_t i;
+   struct ixgbe_rx_queue *rxq;
+   bool on;
+
+   PMD_INIT_FUNC_TRACE();
+
+   if (hw->mac.type == ixgbe_mac_82598EB) {
+   if (rxmode->offloads & DEV_RX_OFFLOAD_VLAN_STRIP) {
+   ctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
+   ctrl |= IXGBE_VLNCTRL_VME;
+   IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, ctrl);
+   } else {
+   ctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
+   ctrl &= ~IXGBE_VLNCTRL_VME;
+   IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, ctrl);
+   }
+   } else {
+   /*
+* Other 10G NIC, the VLAN strip can be setup
+* per queue in RXDCTL
+*/
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   rxq = dev->data->rx_queues[i];
+   ctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(rxq->reg_idx));
+   if (rxq->offloads & DEV_RX_OFFLOAD_VLAN_STRIP) {
+   ctrl |= IXGBE_RXDCTL_VME;
+   on = TRUE;
+   } else {
+   ctrl &= ~IXGBE_RXDCTL_VME;
+   on = FALSE;
+   }
+   IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(rxq->reg_idx), ctrl);
+
+   /* record those setting for HW strip per queue */
+   ixgbe_vlan_hw_strip_bitmap_set(dev, i, on);
+   }
+   }
+}
+
 static int
 ixgbe_vlan_offload_set(

[dpdk-dev] [PATCH 0/4] ixgbe: convert to new offloads API

2018-02-27 Thread Wei Dai
This patch set adds support of per queue VLAN strip offloading
in ixgbe PF and VF.
This patch support new offloads API in ixgbe PF and VF.

Wei Dai (4):
  net/ixgbe: support VLAN strip per queue offloading in PF
  net/ixgbe: support VLAN strip per queue offloading in VF
  net/ixgbe: convert to new Rx offloads API
  net/ixgbe: convert to new Tx offloads API

 drivers/net/ixgbe/ixgbe_ethdev.c  | 243 +-
 drivers/net/ixgbe/ixgbe_ethdev.h  |   4 +-
 drivers/net/ixgbe/ixgbe_ipsec.c   |  13 +-
 drivers/net/ixgbe/ixgbe_pf.c  |   5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c| 209 ++---
 drivers/net/ixgbe/ixgbe_rxtx.h|  12 ++
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h |   2 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c   |   2 +-
 8 files changed, 318 insertions(+), 172 deletions(-)

-- 
2.7.5



[dpdk-dev] [PATCH 4/4] net/ixgbe: convert to new Tx offloads API

2018-02-27 Thread Wei Dai
Ethdev Tx offloads API has changed since:
commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")
This commit support the new Tx offloads API.

Signed-off-by: Wei Dai 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 40 +++--
 drivers/net/ixgbe/ixgbe_ipsec.c  |  5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   | 65 +---
 drivers/net/ixgbe/ixgbe_rxtx.h   |  8 +
 4 files changed, 83 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index b9a23eb..1f4881e 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -3647,28 +3647,8 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->rx_queue_offload_capa = ixgbe_get_rx_queue_offloads(dev);
dev_info->rx_offload_capa = (ixgbe_get_rx_port_offloads(dev) |
 dev_info->rx_queue_offload_capa);
-
-   dev_info->tx_offload_capa =
-   DEV_TX_OFFLOAD_VLAN_INSERT |
-   DEV_TX_OFFLOAD_IPV4_CKSUM  |
-   DEV_TX_OFFLOAD_UDP_CKSUM   |
-   DEV_TX_OFFLOAD_TCP_CKSUM   |
-   DEV_TX_OFFLOAD_SCTP_CKSUM  |
-   DEV_TX_OFFLOAD_TCP_TSO;
-
-   if (hw->mac.type == ixgbe_mac_82599EB ||
-   hw->mac.type == ixgbe_mac_X540)
-   dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_MACSEC_INSERT;
-
-   if (hw->mac.type == ixgbe_mac_X550 ||
-   hw->mac.type == ixgbe_mac_X550EM_x ||
-   hw->mac.type == ixgbe_mac_X550EM_a)
-   dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM;
-
-#ifdef RTE_LIBRTE_SECURITY
-   if (dev->security_ctx)
-   dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_SECURITY;
-#endif
+   dev_info->tx_queue_offload_capa = 0;
+   dev_info->tx_offload_capa = ixgbe_get_tx_port_offlaods(dev);
 
dev_info->default_rxconf = (struct rte_eth_rxconf) {
.rx_thresh = {
@@ -3690,7 +3670,9 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.tx_free_thresh = IXGBE_DEFAULT_TX_FREE_THRESH,
.tx_rs_thresh = IXGBE_DEFAULT_TX_RSBIT_THRESH,
.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
-   ETH_TXQ_FLAGS_NOOFFLOADS,
+ETH_TXQ_FLAGS_NOOFFLOADS |
+ETH_TXQ_FLAGS_IGNORE,
+   .offloads = 0,
};
 
dev_info->rx_desc_lim = rx_desc_lim;
@@ -3774,12 +3756,8 @@ ixgbevf_dev_info_get(struct rte_eth_dev *dev,
dev_info->rx_queue_offload_capa = ixgbe_get_rx_queue_offloads(dev);
dev_info->rx_offload_capa = (ixgbe_get_rx_port_offloads(dev) |
 dev_info->rx_queue_offload_capa);
-   dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT |
-   DEV_TX_OFFLOAD_IPV4_CKSUM  |
-   DEV_TX_OFFLOAD_UDP_CKSUM   |
-   DEV_TX_OFFLOAD_TCP_CKSUM   |
-   DEV_TX_OFFLOAD_SCTP_CKSUM  |
-   DEV_TX_OFFLOAD_TCP_TSO;
+   dev_info->tx_queue_offload_capa = 0;
+   dev_info->tx_offload_capa = ixgbe_get_tx_port_offlaods(dev);
 
dev_info->default_rxconf = (struct rte_eth_rxconf) {
.rx_thresh = {
@@ -3801,7 +3779,9 @@ ixgbevf_dev_info_get(struct rte_eth_dev *dev,
.tx_free_thresh = IXGBE_DEFAULT_TX_FREE_THRESH,
.tx_rs_thresh = IXGBE_DEFAULT_TX_RSBIT_THRESH,
.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
-   ETH_TXQ_FLAGS_NOOFFLOADS,
+ETH_TXQ_FLAGS_NOOFFLOADS |
+ETH_TXQ_FLAGS_IGNORE,
+   .offloads = 0,
};
 
dev_info->rx_desc_lim = rx_desc_lim;
diff --git a/drivers/net/ixgbe/ixgbe_ipsec.c b/drivers/net/ixgbe/ixgbe_ipsec.c
index 29e4728..de7ed36 100644
--- a/drivers/net/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ixgbe/ixgbe_ipsec.c
@@ -599,8 +599,11 @@ ixgbe_crypto_enable_ipsec(struct rte_eth_dev *dev)
struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
uint32_t reg;
uint64_t rx_offloads;
+   uint64_t tx_offloads;
 
rx_offloads = dev->data->dev_conf.rxmode.offloads;
+   tx_offloads = dev->data->dev_conf.txmode.offloads;
+
/* sanity checks */
if (rx_offloads & DEV_RX_OFFLOAD_TCP_LRO) {
PMD_DRV_LOG(ERR, "RSC and IPsec not supported");
@@ -634,7 +637,7 @@ ixgbe_crypto_enable_ipsec(struct rte_eth_dev *dev)
return -1;
}
}
-   if (dev->data->dev_conf.txmode.offloads & DEV_TX_OFFLOAD_SECURITY) {
+   if (tx_offloads & DEV_TX_OFFLOAD_SECURITY) {
IXGBE_WRITE_REG(hw, IXGBE_SECTXCTRL,
IXGBE_SECTXCTRL_STORE_FORWARD);

[dpdk-dev] [PATCH 3/4] net/ixgbe: convert to new Rx offloads API

2018-02-27 Thread Wei Dai
Ethdev Rx offloads API has changed since:
commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API")
This commit support the new Rx offloads API.

Signed-off-by: Wei Dai 
---
 drivers/net/ixgbe/ixgbe_ethdev.c  |  88 +-
 drivers/net/ixgbe/ixgbe_ipsec.c   |   8 +-
 drivers/net/ixgbe/ixgbe_rxtx.c| 143 ++
 drivers/net/ixgbe/ixgbe_rxtx.h|   3 +
 drivers/net/ixgbe/ixgbe_rxtx_vec_common.h |   2 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c   |   2 +-
 6 files changed, 180 insertions(+), 66 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 8bb67ba..b9a23eb 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -2105,19 +2105,22 @@ ixgbe_vlan_hw_strip_config(struct rte_eth_dev *dev)
 static int
 ixgbe_vlan_offload_set(struct rte_eth_dev *dev, int mask)
 {
+   struct rte_eth_rxmode *rxmode;
+   rxmode = &dev->data->dev_conf.rxmode;
+
if (mask & ETH_VLAN_STRIP_MASK) {
ixgbe_vlan_hw_strip_config(dev);
}
 
if (mask & ETH_VLAN_FILTER_MASK) {
-   if (dev->data->dev_conf.rxmode.hw_vlan_filter)
+   if (rxmode->offloads & DEV_RX_OFFLOAD_VLAN_FILTER)
ixgbe_vlan_hw_filter_enable(dev);
else
ixgbe_vlan_hw_filter_disable(dev);
}
 
if (mask & ETH_VLAN_EXTEND_MASK) {
-   if (dev->data->dev_conf.rxmode.hw_vlan_extend)
+   if (rxmode->offloads & DEV_RX_OFFLOAD_VLAN_EXTEND)
ixgbe_vlan_hw_extend_enable(dev);
else
ixgbe_vlan_hw_extend_disable(dev);
@@ -2353,6 +2356,15 @@ ixgbe_dev_configure(struct rte_eth_dev *dev)
adapter->rx_bulk_alloc_allowed = true;
adapter->rx_vec_allowed = true;
 
+   /*
+* Header split and VLAN strip are per queue offload features,
+* clear them first and set them if they are enabled on any Rx queue.
+* This is for set_rx_function() called later.
+*/
+   if (dev->data->dev_conf.rxmode.ignore_offload_bitfield)
+   dev->data->dev_conf.rxmode.offloads &=
+   ~(ixgbe_get_rx_port_offloads(dev));
+
return 0;
 }
 
@@ -3632,30 +3644,9 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
else
dev_info->max_vmdq_pools = ETH_64_POOLS;
dev_info->vmdq_queue_num = dev_info->max_rx_queues;
-   dev_info->rx_offload_capa =
-   DEV_RX_OFFLOAD_VLAN_STRIP |
-   DEV_RX_OFFLOAD_IPV4_CKSUM |
-   DEV_RX_OFFLOAD_UDP_CKSUM  |
-   DEV_RX_OFFLOAD_TCP_CKSUM  |
-   DEV_RX_OFFLOAD_CRC_STRIP;
-
-   /*
-* RSC is only supported by 82599 and x540 PF devices in a non-SR-IOV
-* mode.
-*/
-   if ((hw->mac.type == ixgbe_mac_82599EB ||
-hw->mac.type == ixgbe_mac_X540) &&
-   !RTE_ETH_DEV_SRIOV(dev).active)
-   dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TCP_LRO;
-
-   if (hw->mac.type == ixgbe_mac_82599EB ||
-   hw->mac.type == ixgbe_mac_X540)
-   dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_MACSEC_STRIP;
-
-   if (hw->mac.type == ixgbe_mac_X550 ||
-   hw->mac.type == ixgbe_mac_X550EM_x ||
-   hw->mac.type == ixgbe_mac_X550EM_a)
-   dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM;
+   dev_info->rx_queue_offload_capa = ixgbe_get_rx_queue_offloads(dev);
+   dev_info->rx_offload_capa = (ixgbe_get_rx_port_offloads(dev) |
+dev_info->rx_queue_offload_capa);
 
dev_info->tx_offload_capa =
DEV_TX_OFFLOAD_VLAN_INSERT |
@@ -3675,10 +3666,8 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM;
 
 #ifdef RTE_LIBRTE_SECURITY
-   if (dev->security_ctx) {
-   dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_SECURITY;
+   if (dev->security_ctx)
dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_SECURITY;
-   }
 #endif
 
dev_info->default_rxconf = (struct rte_eth_rxconf) {
@@ -3689,6 +3678,7 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
},
.rx_free_thresh = IXGBE_DEFAULT_RX_FREE_THRESH,
.rx_drop_en = 0,
+   .offloads = 0,
};
 
dev_info->default_txconf = (struct rte_eth_txconf) {
@@ -3781,11 +3771,9 @@ ixgbevf_dev_info_get(struct rte_eth_dev *dev,
dev_info->max_vmdq_pools = ETH_16_POOLS;
else
dev_info->max_vmdq_pools = ETH_64_POOLS;
-   dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP |
-   DEV_RX_OFFLOAD_IPV4_CKSUM |
-   

[dpdk-dev] [PATCH 2/4] net/ixgbe: support VLAN strip per queue offloading in VF

2018-02-27 Thread Wei Dai
VLAN strip is a per queue offloading in VF. With this patch
it can be enabled or disabled on any Rx queue in VF.

Signed-off-by: Wei Dai 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 73755d2..8bb67ba 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -5215,15 +5215,17 @@ ixgbevf_vlan_offload_set(struct rte_eth_dev *dev, int 
mask)
 {
struct ixgbe_hw *hw =
IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct ixgbe_rx_queue *rxq;
uint16_t i;
int on = 0;
 
/* VF function only support hw strip feature, others are not support */
if (mask & ETH_VLAN_STRIP_MASK) {
-   on = !!(dev->data->dev_conf.rxmode.hw_vlan_strip);
-
-   for (i = 0; i < hw->mac.max_rx_queues; i++)
+   for (i = 0; i < hw->mac.max_rx_queues; i++) {
+   rxq = dev->data->rx_queues[i];
+   on = !!(rxq->offloads & DEV_RX_OFFLOAD_VLAN_STRIP);
ixgbevf_vlan_strip_queue_set(dev, i, on);
+   }
}
 
return 0;
-- 
2.7.5



[dpdk-dev] [PATCH v2 02/10] lib/librte_vhost: add virtio-crypto user message structure

2018-02-27 Thread Fan Zhang
This patch adds virtio-crypto spec user message structure to
vhost_user.

Signed-off-by: Fan Zhang 
---
 lib/librte_vhost/vhost_user.c |  2 ++
 lib/librte_vhost/vhost_user.h | 31 ++-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 6a90d2a96..9d736a24f 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -50,6 +50,8 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
[VHOST_USER_SET_SLAVE_REQ_FD]  = "VHOST_USER_SET_SLAVE_REQ_FD",
[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
+   [VHOST_USER_CRYPTO_CREATE_SESS] = "VHOST_USER_CRYPTO_CREATE_SESS",
+   [VHOST_USER_CRYPTO_CLOSE_SESS] = "VHOST_USER_CRYPTO_CLOSE_SESS",
 };
 
 static uint64_t
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 354615c8b..99febb7fa 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -20,13 +20,15 @@
 #define VHOST_USER_PROTOCOL_F_REPLY_ACK3
 #define VHOST_USER_PROTOCOL_F_NET_MTU 4
 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5
+#define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
 
 #define VHOST_USER_PROTOCOL_FEATURES   ((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
 (1ULL << 
VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
 (1ULL << VHOST_USER_PROTOCOL_F_RARP) | 
\
 (1ULL << 
VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
 (1ULL << 
VHOST_USER_PROTOCOL_F_NET_MTU) | \
-(1ULL << 
VHOST_USER_PROTOCOL_F_SLAVE_REQ))
+(1ULL << 
VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
+(1ULL << 
VHOST_USER_PROTOCOL_F_CRYPTO_SESSION))
 
 typedef enum VhostUserRequest {
VHOST_USER_NONE = 0,
@@ -52,6 +54,8 @@ typedef enum VhostUserRequest {
VHOST_USER_NET_SET_MTU = 20,
VHOST_USER_SET_SLAVE_REQ_FD = 21,
VHOST_USER_IOTLB_MSG = 22,
+   VHOST_USER_CRYPTO_CREATE_SESS = 26,
+   VHOST_USER_CRYPTO_CLOSE_SESS = 27,
VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -79,6 +83,30 @@ typedef struct VhostUserLog {
uint64_t mmap_offset;
 } VhostUserLog;
 
+/* Comply with Cryptodev-Linux */
+#define VHOST_USER_CRYPTO_MAX_HMAC_KEY_LENGTH  512
+#define VHOST_USER_CRYPTO_MAX_CIPHER_KEY_LENGTH64
+
+/* Same structure as vhost-user backend session info */
+typedef struct VhostUserCryptoSessionParam {
+   int64_t session_id;
+   uint32_t op_code;
+   uint32_t cipher_algo;
+   uint32_t cipher_key_len;
+   uint32_t hash_algo;
+   uint32_t digest_len;
+   uint32_t auth_key_len;
+   uint32_t aad_len;
+   uint8_t op_type;
+   uint8_t dir;
+   uint8_t hash_mode;
+   uint8_t chaining_dir;
+   uint8_t *ciphe_key;
+   uint8_t *auth_key;
+   uint8_t cipher_key_buf[VHOST_USER_CRYPTO_MAX_CIPHER_KEY_LENGTH];
+   uint8_t auth_key_buf[VHOST_USER_CRYPTO_MAX_HMAC_KEY_LENGTH];
+} VhostUserCryptoSessionParam;
+
 typedef struct VhostUserMsg {
union {
VhostUserRequest master;
@@ -99,6 +127,7 @@ typedef struct VhostUserMsg {
VhostUserMemory memory;
VhostUserLoglog;
struct vhost_iotlb_msg iotlb;
+   VhostUserCryptoSessionParam crypto_session;
} payload;
int fds[VHOST_MEMORY_MAX_NREGIONS];
 } __attribute((packed)) VhostUserMsg;
-- 
2.13.6



[dpdk-dev] [PATCH v2 01/10] lib/librte_vhost: add vhost user private info structure

2018-02-27 Thread Fan Zhang
This patch adds a vhost_user_dev_priv structure and a vhost_user
message handler function prototype to vhost_user. This allows
different types of devices to add private information and their
device-specific vhost-user message function handlers to
virtio_net structure. The change to vhost_user_msg_handler is
also added to call the device-specific message handler.

Signed-off-by: Fan Zhang 
---
 lib/librte_vhost/vhost.h  |  5 -
 lib/librte_vhost/vhost_user.c | 13 -
 lib/librte_vhost/vhost_user.h |  7 +++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index d947bc9e3..19ee3fd37 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -241,8 +242,10 @@ struct virtio_net {
struct guest_page   *guest_pages;
 
int slave_req_fd;
-} __rte_cache_aligned;
 
+   /* Private data for different virtio device type */
+   void*private_data;
+} __rte_cache_aligned;
 
 #define VHOST_LOG_PAGE 4096
 
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 90ed2112e..6a90d2a96 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1477,7 +1477,18 @@ vhost_user_msg_handler(int vid, int fd)
break;
 
default:
-   ret = -1;
+   if (!dev->private_data)
+   ret = -1;
+   else {
+   struct vhost_user_dev_priv *priv = dev->private_data;
+
+   if (!priv->vhost_user_msg_handler)
+   ret = -1;
+   else {
+   ret = (*priv->vhost_user_msg_handler)(dev,
+   &msg, fd);
+   }
+   }
break;
 
}
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index d4bd604b9..354615c8b 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -108,6 +108,13 @@ typedef struct VhostUserMsg {
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION0x1
 
+typedef int (*msg_handler)(struct virtio_net *dev, struct VhostUserMsg *msg,
+   int fd);
+
+struct vhost_user_dev_priv {
+   msg_handler vhost_user_msg_handler;
+   char data[0];
+};
 
 /* vhost_user.c */
 int vhost_user_msg_handler(int vid, int fd);
-- 
2.13.6



[dpdk-dev] [PATCH v2 03/10] lib/librte_vhost: add session message handler

2018-02-27 Thread Fan Zhang
This patch adds session message handler to vhost crypto

Signed-off-by: Fan Zhang 
---
 lib/librte_vhost/vhost_crypto.c | 399 
 1 file changed, 399 insertions(+)
 create mode 100644 lib/librte_vhost/vhost_crypto.c

diff --git a/lib/librte_vhost/vhost_crypto.c b/lib/librte_vhost/vhost_crypto.c
new file mode 100644
index 0..b7b7ff39d
--- /dev/null
+++ b/lib/librte_vhost/vhost_crypto.c
@@ -0,0 +1,399 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation
+ */
+
+#include 
+#include 
+#include 
+#ifdef RTE_LIBRTE_VHOST_DEBUG
+#include 
+#endif
+#include "vhost.h"
+#include "vhost_user.h"
+#include "rte_vhost_crypto.h"
+
+#define NB_MEMPOOL_OBJS(1024)
+#define NB_CRYPTO_DESCRIPTORS  (1024)
+#define NB_CACHE_OBJS  (128)
+
+#define SESSION_MAP_ENTRIES(1024) /**< Max nb sessions per vdev */
+#define MAX_KEY_SIZE   (32)
+#define VHOST_CRYPTO_MAX_IV_LEN(16)
+#define MAX_COUNT_DOWN_TIMES   (100)
+
+#ifdef RTE_LIBRTE_VHOST_DEBUG
+#define VC_LOG_ERR(fmt, args...)   \
+   RTE_LOG(ERR, USER1, "[%s] %s() line %u: " fmt "\n", \
+   "Vhost-Crypto", __func__, __LINE__, ## args)
+#define VC_LOG_INFO(fmt, args...)  \
+   RTE_LOG(INFO, USER1, "[%s] %s() line %u: " fmt "\n",\
+   "Vhost-Crypto", __func__, __LINE__, ## args)
+
+#define VC_LOG_DBG(fmt, args...)   \
+   RTE_LOG(DEBUG, USER1, "[%s] %s() line %u: " fmt "\n",   \
+   "Vhost-Crypto", __func__, __LINE__, ## args)
+#else
+#define VC_LOG_ERR(fmt, args...)   \
+   RTE_LOG(ERR, USER1, "[VHOST-Crypto]: " fmt "\n", ## args)
+#define VC_LOG_INFO(fmt, args...)  \
+   RTE_LOG(INFO, USER1, "[VHOST-Crypto]: " fmt "\n", ## args)
+#define VC_LOG_DBG(fmt, args...)
+#endif
+
+#define VIRTIO_CRYPTO_FEATURES ((1 << VIRTIO_F_NOTIFY_ON_EMPTY) |  \
+   (1 << VIRTIO_RING_F_INDIRECT_DESC) |\
+   (1 << VIRTIO_RING_F_EVENT_IDX) |\
+   (1 << VIRTIO_CRYPTO_SERVICE_CIPHER) |   \
+   (1 << VIRTIO_CRYPTO_SERVICE_HASH) | \
+   (1 << VIRTIO_CRYPTO_SERVICE_MAC) |  \
+   (1 << VIRTIO_CRYPTO_SERVICE_AEAD) | \
+   (1 << VIRTIO_NET_F_CTRL_VQ))
+
+/**
+ * 1-to-1 mapping between RTE_CRYPTO_*ALGO* and VIRTIO_CRYPTO_*ALGO*, for
+ * algorithms not supported by RTE_CRYPTODEV, the -VIRTIO_CRYPTO_NOTSUPP is
+ * returned.
+ */
+static int cipher_algo_transform[] = {
+   RTE_CRYPTO_CIPHER_NULL,
+   RTE_CRYPTO_CIPHER_ARC4,
+   RTE_CRYPTO_CIPHER_AES_ECB,
+   RTE_CRYPTO_CIPHER_AES_CBC,
+   RTE_CRYPTO_CIPHER_AES_CTR,
+   -VIRTIO_CRYPTO_NOTSUPP, /* VIRTIO_CRYPTO_CIPHER_DES_ECB */
+   RTE_CRYPTO_CIPHER_DES_CBC,
+   RTE_CRYPTO_CIPHER_3DES_ECB,
+   RTE_CRYPTO_CIPHER_3DES_CBC,
+   RTE_CRYPTO_CIPHER_3DES_CTR,
+   RTE_CRYPTO_CIPHER_KASUMI_F8,
+   RTE_CRYPTO_CIPHER_SNOW3G_UEA2,
+   RTE_CRYPTO_CIPHER_AES_F8,
+   RTE_CRYPTO_CIPHER_AES_XTS,
+   RTE_CRYPTO_CIPHER_ZUC_EEA3
+};
+
+/**
+ * VIRTIO_CRYTPO_AUTH_* indexes are not sequential, the gaps are filled with
+ * -VIRTIO_CRYPTO_BADMSG errors.
+ */
+static int auth_algo_transform[] = {
+   RTE_CRYPTO_AUTH_NULL,
+   RTE_CRYPTO_AUTH_MD5_HMAC,
+   RTE_CRYPTO_AUTH_SHA1_HMAC,
+   RTE_CRYPTO_AUTH_SHA224_HMAC,
+   RTE_CRYPTO_AUTH_SHA256_HMAC,
+   RTE_CRYPTO_AUTH_SHA384_HMAC,
+   RTE_CRYPTO_AUTH_SHA512_HMAC,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_NOTSUPP, /* VIRTIO_CRYPTO_MAC_CMAC_3DES */
+   RTE_CRYPTO_AUTH_AES_CMAC,
+   RTE_CRYPTO_AUTH_KASUMI_F9,
+   RTE_CRYPTO_AUTH_SNOW3G_UIA2,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMSG,
+   -VIRTIO_CRYPTO_BADMS

[dpdk-dev] [PATCH v2 04/10] lib/librte_vhost: add request handler

2018-02-27 Thread Fan Zhang
This patch adds the implementation that parses virtio crypto request
to dpdk crypto operation

Signed-off-by: Fan Zhang 
---
 lib/librte_vhost/vhost_crypto.c | 607 
 1 file changed, 607 insertions(+)

diff --git a/lib/librte_vhost/vhost_crypto.c b/lib/librte_vhost/vhost_crypto.c
index b7b7ff39d..3b6323cda 100644
--- a/lib/librte_vhost/vhost_crypto.c
+++ b/lib/librte_vhost/vhost_crypto.c
@@ -21,6 +21,10 @@
 #define VHOST_CRYPTO_MAX_IV_LEN(16)
 #define MAX_COUNT_DOWN_TIMES   (100)
 
+#define INHDR_LEN  (sizeof(struct virtio_crypto_inhdr))
+#define IV_OFFSET  (sizeof(struct rte_crypto_op) + \
+   sizeof(struct rte_crypto_sym_op))
+
 #ifdef RTE_LIBRTE_VHOST_DEBUG
 #define VC_LOG_ERR(fmt, args...)   \
RTE_LOG(ERR, USER1, "[%s] %s() line %u: " fmt "\n", \
@@ -49,6 +53,12 @@
(1 << VIRTIO_CRYPTO_SERVICE_AEAD) | \
(1 << VIRTIO_NET_F_CTRL_VQ))
 
+
+#define GPA_TO_VVA(t, m, a)(t)(uintptr_t)rte_vhost_gpa_to_vva(m, a)
+
+/* Macro to get the buffer at the end of rte_crypto_op */
+#define REQ_OP_OFFSET  (IV_OFFSET + VHOST_CRYPTO_MAX_IV_LEN)
+
 /**
  * 1-to-1 mapping between RTE_CRYPTO_*ALGO* and VIRTIO_CRYPTO_*ALGO*, for
  * algorithms not supported by RTE_CRYPTODEV, the -VIRTIO_CRYPTO_NOTSUPP is
@@ -170,6 +180,23 @@ struct vhost_crypto {
uint8_t zero_copy;
 } __rte_cache_aligned;
 
+struct vhost_crypto_data_req {
+   struct vring_desc *head;
+   struct rte_vhost_memory *mem;
+   struct virtio_crypto_inhdr *inhdr;
+
+   uint16_t desc_idx;
+   uint32_t len;
+   struct vhost_virtqueue *vq;
+
+   uint8_t zero_copy;
+
+   int vid;
+
+   struct vring_desc *wb_desc;
+   uint16_t wb_len;
+};
+
 static int
 transform_cipher_param(struct rte_crypto_sym_xform *xform,
VhostUserCryptoSessionParam *param)
@@ -397,3 +424,583 @@ vhost_crypto_msg_handler(struct virtio_net *dev, struct 
VhostUserMsg *msg,
 
return ret;
 }
+
+static __rte_always_inline struct vring_desc *
+find_write_desc(struct vring_desc *head, struct vring_desc *desc)
+{
+   if (desc->flags & VRING_DESC_F_WRITE)
+   return desc;
+
+   while (desc->flags & VRING_DESC_F_NEXT) {
+   desc = &head[desc->next];
+   if (desc->flags & VRING_DESC_F_WRITE)
+   return desc;
+   }
+
+   return NULL;
+}
+
+static struct virtio_crypto_inhdr *
+reach_inhdr(struct vring_desc *head, struct rte_vhost_memory *mem,
+   struct vring_desc *desc)
+{
+   while (desc->flags & VRING_DESC_F_NEXT)
+   desc = &head[desc->next];
+
+   return GPA_TO_VVA(struct virtio_crypto_inhdr *, mem, desc->addr);
+}
+
+static __rte_always_inline int
+move_desc(struct vring_desc *head, struct vring_desc **cur_desc,
+   uint32_t size)
+{
+   struct vring_desc *desc = *cur_desc;
+   int left = size;
+
+   rte_prefetch0(&head[desc->next]);
+   left -= desc->len;
+
+   while ((desc->flags & VRING_DESC_F_NEXT) && left > 0) {
+   desc = &head[desc->next];
+   rte_prefetch0(&head[desc->next]);
+   left -= desc->len;
+   }
+
+   if (unlikely(left < 0)) {
+   VC_LOG_ERR("Incorrect virtio descriptor");
+   return -1;
+   }
+
+   *cur_desc = &head[desc->next];
+   return 0;
+}
+
+static int
+copy_data(void *dst_data, struct vring_desc *head, struct rte_vhost_memory 
*mem,
+   struct vring_desc **cur_desc, uint32_t size)
+{
+   struct vring_desc *desc = *cur_desc;
+   uint32_t to_copy;
+   uint8_t *data = dst_data;
+   uint8_t *src;
+   int left = size;
+
+   rte_prefetch0(&head[desc->next]);
+   to_copy = RTE_MIN(desc->len, (uint32_t)left);
+   src = GPA_TO_VVA(uint8_t *, mem, desc->addr);
+   rte_memcpy((uint8_t *)data, src, to_copy);
+   left -= to_copy;
+
+   while ((desc->flags & VRING_DESC_F_NEXT) && left > 0) {
+   desc = &head[desc->next];
+   rte_prefetch0(&head[desc->next]);
+   to_copy = RTE_MIN(desc->len, (uint32_t)left);
+   src = GPA_TO_VVA(uint8_t *, mem, desc->addr);
+   rte_memcpy(data + size - left, src, to_copy);
+   left -= to_copy;
+   }
+
+   if (unlikely(left < 0)) {
+   VC_LOG_ERR("Incorrect virtio descriptor");
+   return -1;
+   }
+
+   *cur_desc = &head[desc->next];
+
+   return 0;
+}
+
+static __rte_always_inline void *
+get_data_ptr(struct vring_desc *head, struct rte_vhost_memory *mem,
+   struct vring_desc **cur_desc, uint32_t size)
+{
+   void *data;
+
+   data = GPA_TO_VVA(void *, mem, (*cur_desc)->addr);
+   if (unlikely(!data)) {
+   VC_LOG_ERR("Failed to get object");
+   ret

[dpdk-dev] [PATCH v2 05/10] lib/librte_vhost: add head file

2018-02-27 Thread Fan Zhang
This patch adds public head file API for vhost crypto

Signed-off-by: Fan Zhang 
---
 lib/librte_vhost/rte_vhost_crypto.h | 122 
 1 file changed, 122 insertions(+)
 create mode 100644 lib/librte_vhost/rte_vhost_crypto.h

diff --git a/lib/librte_vhost/rte_vhost_crypto.h 
b/lib/librte_vhost/rte_vhost_crypto.h
new file mode 100644
index 0..1560fcc12
--- /dev/null
+++ b/lib/librte_vhost/rte_vhost_crypto.h
@@ -0,0 +1,122 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation
+ */
+
+#ifndef _VHOST_CRYPTO_H_
+#define _VHOST_CRYPTO_H_
+
+#include 
+#include 
+#include 
+#include 
+#include "rte_vhost.h"
+
+#ifndef MAX_DATA_QUEUES
+#define MAX_DATA_QUEUES(1)
+#endif
+
+#define VIRTIO_CRYPTO_CTRL_QUEUE   (0)
+#define VIRTIO_CRYPTO_MAX_NUM_DEVS (64)
+#define VIRTIO_CRYPTO_MAX_NUM_BURST_VQS(64)
+
+/** Feature bits */
+#define VIRTIO_CRYPTO_F_CIPHER_SESSION_MODE(1)
+#define VIRTIO_CRYPTO_F_HASH_SESSION_MODE  (2)
+#define VIRTIO_CRYPTO_F_MAC_SESSION_MODE   (3)
+#define VIRTIO_CRYPTO_F_AEAD_SESSION_MODE  (4)
+
+#define VHOST_CRYPTO_MBUF_POOL_SIZE(8192)
+#define VHOST_CRYPTO_MAX_BURST_SIZE(64)
+
+/**
+ *  Create Vhost-crypto instance
+ *
+ * @param vid
+ *  The identifier of the vhost device.
+ * @param cryptodev_id
+ *  The identifier of DPDK Cryptodev, the same cryptodev_id can be assigned to
+ *  multiple Vhost-crypto devices.
+ * @param sess_pool
+ *  The pointer to the created cryptodev session pool with the private data 
size
+ *  matches the target DPDK Cryptodev.
+ * @param socket_id
+ *  NUMA Socket ID to allocate resources on. *
+ * @return
+ *  0 if the Vhost Crypto Instance is created successfully.
+ *  Negative integer if otherwise
+ */
+int
+rte_vhost_crypto_create(int vid, uint8_t cryptodev_id,
+   struct rte_mempool *sess_pool, int socket_id);
+
+/**
+ *  Free the Vhost-crypto instance
+ *
+ * @param vid
+ *  The identifier of the vhost device.
+ * @return
+ *  0 if the Vhost Crypto Instance is created successfully.
+ *  Negative integer if otherwise.
+ */
+int
+rte_vhost_crypto_free(int vid);
+
+/**
+ *  Enable or disable zero copy feature
+ *
+ * @param vid
+ *  The identifier of the vhost device.
+ * @param enable_zc
+ *  Flag of zero copy feature, set 1 to enable or 0 to disable.
+ * @return
+ *  0 if completed successfully.
+ *  Negative integer if otherwise.
+ */
+int
+rte_vhost_crypto_set_zero_copy(int vid, uint32_t enable_zc);
+
+/**
+ * Fetch a number of vring descriptors from virt-queue and convert to DPDK
+ * crypto operations. After this function is executed, the user can enqueue
+ * the processed ops to the target cryptodev.
+ *
+ * @param vid
+ *  The identifier of the vhost device.
+ * @param qid
+ *  Virtio queue index.
+ * @param ops
+ *  The address of an array of pointers to *rte_crypto_op* structures.
+ * @param nb_ops
+ *  The maximum number of operations to be fetched and translated.
+ * @return
+ *  The number of fetched and processed vhost crypto request operations.
+ */
+uint16_t
+rte_vhost_crypto_fetch_requests(int vid, uint32_t qid,
+   struct rte_crypto_op **ops, uint16_t nb_ops);
+/**
+ * Finalize the dequeued crypto ops. After the converted crypto ops are
+ * dequeued from the cryptodev, this function shall be called to update the
+ * used-ring indexes and write the processed data back to the vring descriptor
+ * (if zero-copy is disabled).
+ *
+ * @param ops
+ *  The address of an array of *rte_crypto_op* structure that was dequeued
+ *  from cryptodev.
+ * @param nb_ops
+ *  The number of operations contained in the array.
+ * @callfds
+ *  The pointer to the array that the owner callfd number(s) of the
+ *  virtio-crypto requests contained in the cryptodev operations finalized to
+ *  be written back. The size of the array shall be no less than the number
+ *  of total virtual devices possible.
+ * @nb_callfds
+ *  The number of call_fd numbers contained in the callfds array.
+ * @return
+ *  The number of ops processed.
+ */
+uint16_t
+rte_vhost_crypto_finalize_requests(struct rte_crypto_op **ops,
+   uint16_t nb_ops, int *callfds, uint16_t *nb_callfds);
+
+#endif /**< _VHOST_CRYPTO_H_ */
-- 
2.13.6



  1   2   >