Re: [dpdk-dev] Survey for final decision about per-port offload API

2018-04-12 Thread Maxime Coquelin

Hi Thomas,

On 03/30/2018 03:47 PM, Thomas Monjalon wrote:

There are some discussions about a specific part of the offload API:
"To enable per-port offload, the offload should be set on both
device configuration and queue setup."

It means the application must repeat the port offload flags
in rte_eth_conf.[rt]xmode.offloads and rte_eth_[rt]xconf.offloads,
when calling respectively rte_eth_dev_configure() and
rte_eth_[rt]x_queue_setup for each queue.

The PMD must check if there is mismatch, i.e. a port offload not
repeated in queue setup.
There is a proposal to do this check at ethdev level:
http://dpdk.org/ml/archives/dev/2018-March/094023.html

It was also proposed to relax the API and allow "forgetting" port
offloads in queue offloads:
http://dpdk.org/ml/archives/dev/2018-March/092978.html

It would mean the offloads applied to a queue result of OR operation:
rte_eth_conf.[rt]xmode.offloads | rte_eth_[rt]xconf.offloads

1/ Do you agree with above API change?



Yes


If we agree with this change, we need to update the documentation
and remove the checks in PMDs.
Note: no matter what is decided here, 18.05-rc1 should have all PMDs
switched to the API which was defined in 17.11.
Given that API is new and not yet adopted by the applications,
the sonner it is fixed, the better.

2/ Should we do this change in 18.05-rc2?



Yes


At the same time, we want to make clear that an offload enabled at
port level, cannot be disabled at queue level.

3/ Do you agree with above statement (to be added in the doc)?



Yes


There is the same kind of confusion in the offload capabilities:
rte_eth_dev_info.[rt]x_offload_capa
rte_eth_dev_info.[rt]x_queue_offload_capa
The queue capabilities must be a subset of port capabilities,
i.e. every queue capabilities must be reported as port capabilities.
But the port capabilities should be reported at queue level
only if it can be applied to a specific queue.

4/ Do you agree with above statement (to be added in the doc)?


Yes



Please give your opinion on questions 1, 2, 3 and 4.
Answering by yes/no may be sufficient in most cases :)
Thank you





Thanks,
Maxime


Re: [dpdk-dev] [PATCH v2 4/4] ether: add packet modification aciton in flow API

2018-04-12 Thread Adrien Mazarguil
On Sun, Apr 01, 2018 at 05:19:22PM -0400, Qi Zhang wrote:
> Add new actions that be used to modify packet content with
> generic semantic:
> 
> RTE_FLOW_ACTION_TYPE_FIELD_UPDATE:
>   - update specific field of packet
> RTE_FLWO_ACTION_TYPE_FIELD_INCREMENT:
>   - increament specific field of packet
> RTE_FLWO_ACTION_TYPE_FIELD_DECREMENT:
>   - decreament specific field of packet
> RTE_FLWO_ACTION_TYPE_FIELD_COPY:
>   - copy data from one field to another in packet.
> 
> All action use struct rte_flow_item parameter to match the pattern
> that going to be modified, if no pattern match, the action just be
> skipped.

That's not good. It must result in undefined behavior, more about that
below.

> These action are non-terminating action. they will not
> impact the fate of the packets.
> 
> Signed-off-by: Qi Zhang 

Noticed a few typos above and in subject line ("aciton", "FLWO",
"increament", "decreament").

Note that I'm usually against using rte_flow_item structures and associated
enum values inside action lists because it could be seen as inconsistent
from an API standpoint. On the other hand, reusing existing types is a good
thing so let's go with that for now.

Please see inline comments.

> ---
>  doc/guides/prog_guide/rte_flow.rst | 89 
> ++
>  lib/librte_ether/rte_flow.h| 57 
>  2 files changed, 146 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst 
> b/doc/guides/prog_guide/rte_flow.rst
> index aa5c818..6628964 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -1508,6 +1508,95 @@ Representor.
> | ``port_id``  | identification of the destination |
> +--+---+
>  
> +Action: ``FILED_UPDATE``
> +^^^

FILED => FIELD

Underline is also shorter than title and might cause documentation warnings.

> +
> +Update specific field of the packet.
> +
> +- Non-terminating by default.

These statements are not needed since "ethdev: alter behavior of flow API
actions" [1].

[1] http://dpdk.org/ml/archives/dev/2018-April/096527.html

> +
> +.. _table_rte_flow_action_field_update:
> +
> +.. table:: FIELD_UPDATE
> +
> +   +---+-+
> +   | Field | Value   |
> +   +===+=+
> +   | ``item``  | item->type: specify the pattern to modify   |
> +   |   | item->spec: specify the new value to update |
> +   |   | item->mask: specify which part of the pattern to update |
> +   |   | item->last: ignored |

This table needs to be divided a bit more with one cell per field for better
clarity. See other pattern item definitions such as "Item: ``RAW``" for an
example.

> +   +---+-+
> +   | ``layer`` | 0 means outermost matched pattern,  |
> +   |   | 1 means next-to-outermost and so on ... |
> +   +---+-+

What does "layer" refer to by the way? The layer described on the pattern
side of the flow rule, the actual protocol layer matched inside traffic, or
is "item" actually an END-terminated list of items (as suggested by
"pattern" in above documentation)?

I suspect the intent is for layer to have the same definition as RSS
encapulation level ("ethdev: add encap level to RSS flow API action" [2]),
and item points to a single item, correct?

In that case, it's misleading, please rename it "level". Also keep in mind
you can't make an action rely on anything found on the pattern side of a
flow rule.

What happens when this action is attempted on non-matching traffic must be
documented here as well. Refer to discussion re "ethdev: Add tunnel
encap/decap actions" [3]. To be on the safe side, it must be documented as
resulting in undefined behavior.

Based the same thread, I also suggest here to define "last" as reserved and
therefore an error if set to anything other than NULL, however it might
prove useful, see below.

[2] http://dpdk.org/ml/archives/dev/2018-April/096531.html
[3] http://dpdk.org/ml/archives/dev/2018-April/096418.html

> +
> +Action: ``FILED_INCREMENT``
> +^^^

FILED => FIELD

> +
> +Increment 1 on specific field of the packet.

All right, but what for? FIELD_UPDATE overwrites a specific value at some
specific place after matching something rather specific.

In my opinion to get predictable results with FIELD_INCREMENT, applications
also need to have a pretty good idea of what's about to be incremented.
That's because you can't put conditionals in flow rules (yet). So if you
need to match an exact IPv4 address in order to increment it, why wouldn't
you just 

Re: [dpdk-dev] [PATCH 2/2] net/vhost: insert/strip VLAN header in software

2018-04-12 Thread Maxime Coquelin



On 03/29/2018 06:05 PM, Chas Williams wrote:

From: Jan Blunck 

This lets the vhost driver handle the VLAN header like the virtio driver
in software.

Signed-off-by: Jan Blunck 
---
  drivers/net/vhost/rte_eth_vhost.c | 35 ++-
  1 file changed, 34 insertions(+), 1 deletion(-)

Applied to dpdk-next-virtio/master.

Thanks!
Maxime


[dpdk-dev] [PATCH v6 4/4] doc: add ifcvf driver document and release note

2018-04-12 Thread Xiao Wang
Signed-off-by: Xiao Wang 
Reviewed-by: Maxime Coquelin 
Reviewed-by: Ferruh Yigit 
---
 doc/guides/nics/features/ifcvf.ini |  8 +++
 doc/guides/nics/ifcvf.rst  | 98 ++
 doc/guides/nics/index.rst  |  1 +
 doc/guides/rel_notes/release_18_05.rst |  9 
 4 files changed, 116 insertions(+)
 create mode 100644 doc/guides/nics/features/ifcvf.ini
 create mode 100644 doc/guides/nics/ifcvf.rst

diff --git a/doc/guides/nics/features/ifcvf.ini 
b/doc/guides/nics/features/ifcvf.ini
new file mode 100644
index 0..ef1fc4711
--- /dev/null
+++ b/doc/guides/nics/features/ifcvf.ini
@@ -0,0 +1,8 @@
+;
+; Supported features of the 'ifcvf' vDPA driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+x86-32   = Y
+x86-64   = Y
diff --git a/doc/guides/nics/ifcvf.rst b/doc/guides/nics/ifcvf.rst
new file mode 100644
index 0..d7e76353c
--- /dev/null
+++ b/doc/guides/nics/ifcvf.rst
@@ -0,0 +1,98 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright(c) 2018 Intel Corporation.
+
+IFCVF vDPA driver
+=
+
+The IFCVF vDPA (vhost data path acceleration) driver provides support for the
+Intel FPGA 100G VF (IFCVF). IFCVF's datapath is virtio ring compatible, it
+works as a HW vhost backend which can send/receive packets to/from virtio
+directly by DMA. Besides, it supports dirty page logging and device state
+report/restore. This driver enables its vDPA functionality with live migration
+feature.
+
+
+Pre-Installation Configuration
+--
+
+Config File Options
+~~~
+
+The following option can be modified in the ``config`` file.
+
+- ``CONFIG_RTE_LIBRTE_IFCVF_VDPA_PMD`` (default ``y`` for linux)
+
+  Toggle compilation of the ``librte_ifcvf_vdpa`` driver.
+
+
+IFCVF vDPA Implementation
+-
+
+IFCVF's vendor ID and device ID are same as that of virtio net pci device,
+with its specific subsystem vendor ID and device ID. To let the device be
+probed by IFCVF driver, adding "vdpa=1" parameter helps to specify that this
+device is to be used in vDPA mode, rather than polling mode, virtio pmd will
+skip when it detects this message.
+
+Different VF devices serve different virtio frontends which are in different
+VMs, so each VF needs to have its own DMA address translation service. During
+the driver probe a new container is created for this device, with this
+container vDPA driver can program DMA remapping table with the VM's memory
+region information.
+
+Key IFCVF vDPA driver ops
+~
+
+- ifcvf_dev_config:
+  Enable VF data path with virtio information provided by vhost lib, including
+  IOMMU programming to enable VF DMA to VM's memory, VFIO interrupt setup to
+  route HW interrupt to virtio driver, create notify relay thread to translate
+  virtio driver's kick to a MMIO write onto HW, HW queues configuration.
+
+  This function gets called to set up HW data path backend when virtio driver
+  in VM gets ready.
+
+- ifcvf_dev_close:
+  Revoke all the setup in ifcvf_dev_config.
+
+  This function gets called when virtio driver stops device in VM.
+
+To create a vhost port with IFC VF
+~~
+
+- Create a vhost socket and assign a VF's device ID to this socket via
+  vhost API. When QEMU vhost connection gets ready, the assigned VF will
+  get configured automatically.
+
+
+Features
+
+
+Features of the IFCVF driver are:
+
+- Compatibility with virtio 0.95 and 1.0.
+- Live migration.
+
+
+Prerequisites
+-
+
+- Platform with IOMMU feature. IFC VF needs address translation service to
+  Rx/Tx directly with virtio driver in VM.
+
+
+Limitations
+---
+
+Dependency on vfio-pci
+~~
+
+vDPA driver needs to setup VF MSIX interrupts, each queue's interrupt vector
+is mapped to a callfd associated with a virtio ring. Currently only vfio-pci
+allows multiple interrupts, so the IFCVF driver is dependent on vfio-pci.
+
+Live Migration with VIRTIO_NET_F_GUEST_ANNOUNCE
+~~~
+
+IFC VF doesn't support RARP packet generation, virtio frontend supporting
+VIRTIO_NET_F_GUEST_ANNOUNCE feature can help to do that.
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 51c453d9c..a294ab389 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -44,6 +44,7 @@ Network Interface Controller Drivers
 vmxnet3
 pcap_ring
 fail_safe
+ifcvf
 
 **Figures**
 
diff --git a/doc/guides/rel_notes/release_18_05.rst 
b/doc/guides/rel_notes/release_18_05.rst
index 3e1ae0cfd..1bf609f6b 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -84,6 +84,15 @@ API Changes
Also, make sure to start the actual text at the margin.
=
 
+* **Added IFCVF vDPA driver.**
+
+  

[dpdk-dev] [PATCH v6 1/4] eal/vfio: add multiple container support

2018-04-12 Thread Xiao Wang
Currently eal vfio framework binds vfio group fd to the default
container fd during rte_vfio_setup_device, while in some cases,
e.g. vDPA (vhost data path acceleration), we want to put vfio group
to a separate container and program IOMMU via this container.

This patch adds some APIs to support container creating and device
binding with a container.

A driver could use "rte_vfio_create_container" helper to create a
new container from eal, use "rte_vfio_bind_group" to bind a device
to the newly created container.

During rte_vfio_setup_device, the container bound with the device
will be used for IOMMU setup.

Signed-off-by: Junjie Chen 
Signed-off-by: Xiao Wang 
Reviewed-by: Maxime Coquelin 
Reviewed-by: Ferruh Yigit 
---
 config/common_base   |   1 +
 lib/librte_eal/bsdapp/eal/eal.c  |  50 +++
 lib/librte_eal/common/include/rte_vfio.h | 113 +++
 lib/librte_eal/linuxapp/eal/eal_vfio.c   | 522 +--
 lib/librte_eal/linuxapp/eal/eal_vfio.h   |   1 +
 lib/librte_eal/rte_eal_version.map   |   6 +
 6 files changed, 601 insertions(+), 92 deletions(-)

diff --git a/config/common_base b/config/common_base
index c09c7cf88..90c2821ae 100644
--- a/config/common_base
+++ b/config/common_base
@@ -74,6 +74,7 @@ CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=n
 CONFIG_RTE_EAL_VFIO=n
 CONFIG_RTE_MAX_VFIO_GROUPS=64
+CONFIG_RTE_MAX_VFIO_CONTAINERS=64
 CONFIG_RTE_MALLOC_DEBUG=n
 CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
 CONFIG_RTE_USE_LIBBSD=n
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 4eafcb5ad..0a3d8783d 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -746,6 +746,14 @@ int rte_vfio_enable(const char *modname);
 int rte_vfio_is_enabled(const char *modname);
 int rte_vfio_noiommu_is_enabled(void);
 int rte_vfio_clear_group(int vfio_group_fd);
+int rte_vfio_create_container(void);
+int rte_vfio_destroy_container(int container_fd);
+int rte_vfio_bind_group(int container_fd, int iommu_group_no);
+int rte_vfio_unbind_group(int container_fd, int iommu_group_no);
+int rte_vfio_dma_map(int container_fd, int dma_type,
+   const struct rte_memseg *ms);
+int rte_vfio_dma_unmap(int container_fd, int dma_type,
+   const struct rte_memseg *ms);
 
 int rte_vfio_setup_device(__rte_unused const char *sysfs_base,
  __rte_unused const char *dev_addr,
@@ -781,3 +789,45 @@ int rte_vfio_clear_group(__rte_unused int vfio_group_fd)
 {
return 0;
 }
+
+int __rte_experimental
+rte_vfio_create_container(void)
+{
+   return -1;
+}
+
+int __rte_experimental
+rte_vfio_destroy_container(__rte_unused int container_fd)
+{
+   return -1;
+}
+
+int __rte_experimental
+rte_vfio_bind_group(__rte_unused int container_fd,
+   __rte_unused int iommu_group_no)
+{
+   return -1;
+}
+
+int __rte_experimental
+rte_vfio_unbind_group(__rte_unused int container_fd,
+   __rte_unused int iommu_group_no)
+{
+   return -1;
+}
+
+int __rte_experimental
+rte_vfio_dma_map(__rte_unused int container_fd,
+   __rte_unused int dma_type,
+   __rte_unused const struct rte_memseg *ms)
+{
+   return -1;
+}
+
+int __rte_experimental
+rte_vfio_dma_unmap(__rte_unused int container_fd,
+   __rte_unused int dma_type,
+   __rte_unused const struct rte_memseg *ms)
+{
+   return -1;
+}
diff --git a/lib/librte_eal/common/include/rte_vfio.h 
b/lib/librte_eal/common/include/rte_vfio.h
index 249095e46..9bb026703 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -32,6 +32,8 @@
 extern "C" {
 #endif
 
+struct rte_memseg;
+
 /**
  * Setup vfio_cfg for the device identified by its address.
  * It discovers the configured I/O MMU groups or sets a new one for the device.
@@ -131,6 +133,117 @@ rte_vfio_clear_group(int vfio_group_fd);
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Create a new container for device binding.
+ *
+ * @return
+ *   the container fd if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_create_container(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Destroy the container, unbind all vfio groups within it.
+ *
+ * @param container_fd
+ *   the container fd to destroy
+ *
+ * @return
+ *0 if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_destroy_container(int container_fd);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Bind a IOMMU group to a container.
+ *
+ * @param container_fd
+ *   the container's fd
+ *
+ * @param iommu_group_no
+ *   the iommu_group_no to bind to container
+ *
+ * @return
+ *   group fd if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_bind_group(int container_fd, int iommu_group_no);
+
+/**
+ * @

[dpdk-dev] [PATCH v6 3/4] net/ifcvf: add ifcvf vdpa driver

2018-04-12 Thread Xiao Wang
The IFCVF vDPA (vhost data path acceleration) driver provides support for
the Intel FPGA 100G VF (IFCVF). IFCVF's datapath is virtio ring compatible,
it works as a HW vhost backend which can send/receive packets to/from
virtio directly by DMA.

Different VF devices serve different virtio frontends which are in
different VMs, so each VF needs to have its own DMA address translation
service. During the driver probe a new container is created, with this
container vDPA driver can program DMA remapping table with the VM's memory
region information.

Key vDPA driver ops implemented:

- ifcvf_dev_config:
  Enable VF data path with virtio information provided by vhost lib,
  including IOMMU programming to enable VF DMA to VM's memory, VFIO
  interrupt setup to route HW interrupt to virtio driver, create notify
  relay thread to translate virtio driver's kick to a MMIO write onto HW,
  HW queues configuration.

- ifcvf_dev_close:
  Revoke all the setup in ifcvf_dev_config.

Live migration feature is supported by IFCVF and this driver enables
it. For the dirty page logging, VF helps to log for packet buffer write,
driver helps to make the used ring as dirty when device stops.

Because vDPA driver needs to set up MSI-X vector to interrupt the
guest, only vfio-pci is supported currently.

Signed-off-by: Xiao Wang 
Signed-off-by: Rosen Xu 
Reviewed-by: Maxime Coquelin 
Reviewed-by: Ferruh Yigit 
---
 config/common_base|   7 +
 config/common_linuxapp|   1 +
 drivers/net/Makefile  |   3 +
 drivers/net/ifc/Makefile  |  36 ++
 drivers/net/ifc/base/ifcvf.c  | 329 +
 drivers/net/ifc/base/ifcvf.h  | 160 +++
 drivers/net/ifc/base/ifcvf_osdep.h|  52 +++
 drivers/net/ifc/ifcvf_vdpa.c  | 845 ++
 drivers/net/ifc/rte_ifcvf_version.map |   4 +
 mk/rte.app.mk |   3 +
 10 files changed, 1440 insertions(+)
 create mode 100644 drivers/net/ifc/Makefile
 create mode 100644 drivers/net/ifc/base/ifcvf.c
 create mode 100644 drivers/net/ifc/base/ifcvf.h
 create mode 100644 drivers/net/ifc/base/ifcvf_osdep.h
 create mode 100644 drivers/net/ifc/ifcvf_vdpa.c
 create mode 100644 drivers/net/ifc/rte_ifcvf_version.map

diff --git a/config/common_base b/config/common_base
index 90c2821ae..8d5d95868 100644
--- a/config/common_base
+++ b/config/common_base
@@ -790,6 +790,13 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 #
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
+#
+# Compile IFCVF driver
+# To compile, CONFIG_RTE_LIBRTE_VHOST and CONFIG_RTE_EAL_VFIO
+# should be enabled.
+#
+CONFIG_RTE_LIBRTE_IFCVF_VDPA_PMD=n
+
 #
 # Compile the test application
 #
diff --git a/config/common_linuxapp b/config/common_linuxapp
index d0437e5d6..14e56cb4d 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -15,6 +15,7 @@ CONFIG_RTE_LIBRTE_PMD_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_VHOST_NUMA=y
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
+CONFIG_RTE_LIBRTE_IFCVF_VDPA_PMD=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_PMD_TAP=y
 CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 37ca19aa7..d3fafbfe1 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -57,6 +57,9 @@ endif # $(CONFIG_RTE_LIBRTE_SCHED)
 
 ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+ifeq ($(CONFIG_RTE_EAL_VFIO),y)
+DIRS-$(CONFIG_RTE_LIBRTE_IFCVF_VDPA_PMD) += ifc
+endif
 endif # $(CONFIG_RTE_LIBRTE_VHOST)
 
 ifeq ($(CONFIG_RTE_LIBRTE_MVPP2_PMD),y)
diff --git a/drivers/net/ifc/Makefile b/drivers/net/ifc/Makefile
new file mode 100644
index 0..95bb8d769
--- /dev/null
+++ b/drivers/net/ifc/Makefile
@@ -0,0 +1,36 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_ifcvf_vdpa.a
+
+LDLIBS += -lpthread
+LDLIBS += -lrte_eal -lrte_pci -lrte_vhost -lrte_bus_pci
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
+
+#
+# Add extra flags for base driver source files to disable warnings in them
+#
+BASE_DRIVER_OBJS=$(sort $(patsubst %.c,%.o,$(notdir $(wildcard 
$(SRCDIR)/base/*.c
+
+VPATH += $(SRCDIR)/base
+
+EXPORT_MAP := rte_ifcvf_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_IFCVF_VDPA_PMD) += ifcvf_vdpa.c
+SRCS-$(CONFIG_RTE_LIBRTE_IFCVF_VDPA_PMD) += ifcvf.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/ifc/base/ifcvf.c b/drivers/net/ifc/base/ifcvf.c
new file mode 100644
index 0..d312ad99f
--- /dev/null
+++ b/drivers/net/ifc/base/ifcvf.c
@@ -0,0 +1,329 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include "ifcvf.h"
+#include "ifcvf_osdep.h"
+
+STATIC void *
+get_cap_addr(struct ifcvf_hw *hw, struct ifcvf_pci_cap *cap)
+{
+   u8 bar = cap->bar;
+ 

[dpdk-dev] [PATCH v6 0/4] add ifcvf vdpa driver

2018-04-12 Thread Xiao Wang
IFCVF driver

The IFCVF vDPA (vhost data path acceleration) driver provides support for the
Intel FPGA 100G VF (IFCVF). IFCVF's datapath is virtio ring compatible, it
works as a HW vhost backend which can send/receive packets to/from virtio
directly by DMA. Besides, it supports dirty page logging and device state
report/restore. This driver enables its vDPA functionality with live migration
feature.

vDPA mode
=
IFCVF's vendor ID and device ID are same as that of virtio net pci device,
with its specific subsystem vendor ID and device ID. To let the device be
probed by IFCVF driver, adding "vdpa=1" parameter helps to specify that this
device is to be used in vDPA mode, rather than polling mode, virtio pmd will
skip when it detects this message.

Container per device

vDPA needs to create different containers for different devices, thus this
patch set adds some APIs in eal/vfio to support multiple container, e.g.
- rte_vfio_create_container
- rte_vfio_destroy_container
- rte_vfio_bind_group
- rte_vfio_unbind_group

By this extension, a device can be put into a new specific container, rather
than the previous default container.

IFCVF vDPA details
==
Key vDPA driver ops implemented:
- ifcvf_dev_config:
  Enable VF data path with virtio information provided by vhost lib, including
  IOMMU programming to enable VF DMA to VM's memory, VFIO interrupt setup to
  route HW interrupt to virtio driver, create notify relay thread to translate
  virtio driver's kick to a MMIO write onto HW, HW queues configuration.

  This function gets called to set up HW data path backend when virtio driver
  in VM gets ready.

- ifcvf_dev_close:
  Revoke all the setup in ifcvf_dev_config.

  This function gets called when virtio driver stops device in VM.

Change log
==
v6:
- Rebase on master branch.
- Document "vdpa" devarg in virtio documentation.
- Rename ifcvf config option to CONFIG_RTE_LIBRTE_IFCVF_VDPA_PMD for
  consistensy, and add it into driver documentation.
- Add comments for ifcvf device ID.
- Minor code cleaning.

v5:
- Fix compilation in BSD, remove the rte_vfio.h including in BSD.

v4:
- Rebase on Zhihong's latest vDPA lib patch, with vDPA ops names change.
- Remove API "rte_vfio_get_group_fd", "rte_vfio_bind_group" will return the fd.
- Align the vfio_cfg search internal APIs naming.

v3:
- Add doc and release note for the new driver.
- Remove the vdev concept, make the driver as a PCI driver, it will get probed
  by PCI bus driver.
- Rebase on the v4 vDPA lib patch, register a vDPA device instead of a engine.
- Remove the PCI API exposure accordingly.
- Move the MAX_VFIO_CONTAINERS definition to config file.
- Let virtio pmd skips when a virtio device needs to work in vDPA mode.

v2:
- Rename function pci_get_kernel_driver_by_path to rte_pci_device_kdriver_name
  to make the API generic cross Linux and BSD, make it as EXPERIMENTAL.
- Rebase on Zhihong's vDPA v3 patch set.
- Minor code cleanup on vfio extension.


Xiao Wang (4):
  eal/vfio: add multiple container support
  net/virtio: skip device probe in vdpa mode
  net/ifcvf: add ifcvf vdpa driver
  doc: add ifcvf driver document and release note

 config/common_base   |   8 +
 config/common_linuxapp   |   1 +
 doc/guides/nics/features/ifcvf.ini   |   8 +
 doc/guides/nics/ifcvf.rst|  98 
 doc/guides/nics/index.rst|   1 +
 doc/guides/nics/virtio.rst   |  13 +
 doc/guides/rel_notes/release_18_05.rst   |   9 +
 drivers/net/Makefile |   3 +
 drivers/net/ifc/Makefile |  36 ++
 drivers/net/ifc/base/ifcvf.c | 329 
 drivers/net/ifc/base/ifcvf.h | 160 ++
 drivers/net/ifc/base/ifcvf_osdep.h   |  52 ++
 drivers/net/ifc/ifcvf_vdpa.c | 845 +++
 drivers/net/ifc/rte_ifcvf_version.map|   4 +
 drivers/net/virtio/virtio_ethdev.c   |  43 ++
 lib/librte_eal/bsdapp/eal/eal.c  |  50 ++
 lib/librte_eal/common/include/rte_vfio.h | 113 +
 lib/librte_eal/linuxapp/eal/eal_vfio.c   | 522 +++
 lib/librte_eal/linuxapp/eal/eal_vfio.h   |   1 +
 lib/librte_eal/rte_eal_version.map   |   6 +
 mk/rte.app.mk|   3 +
 21 files changed, 2213 insertions(+), 92 deletions(-)
 create mode 100644 doc/guides/nics/features/ifcvf.ini
 create mode 100644 doc/guides/nics/ifcvf.rst
 create mode 100644 drivers/net/ifc/Makefile
 create mode 100644 drivers/net/ifc/base/ifcvf.c
 create mode 100644 drivers/net/ifc/base/ifcvf.h
 create mode 100644 drivers/net/ifc/base/ifcvf_osdep.h
 create mode 100644 drivers/net/ifc/ifcvf_vdpa.c
 create mode 100644 drivers/net/ifc/rte_ifcvf_version.map

-- 
2.15.1



Re: [dpdk-dev] [PATCH v3] net/vhost: fix vhost invalid state

2018-04-12 Thread Tan, Jianfeng



On 4/12/2018 1:02 AM, Junjie Chen wrote:

dev_start sets *dev_attached* after setup queues, this sets device to
invalid state since no frontend is attached. Also destroy_device set
*started* to zero which makes *allow_queuing* always zero until dev_start
get called again. Actually, we should not determine queues existence by
*dev_attached* but by queues pointers or other separated variable(s).

Fixes: 30a701a53737 ("net/vhost: fix crash when creating vdev
dynamically")

Signed-off-by: Junjie Chen 
Tested-by: Jens Freimann 


Overall, looks great to me except a nit below.

Reviewed-by: Jianfeng Tan 


---
Changes in v3:
- remove useless log in queue status showing
Changes in v2:
- use started to determine vhost queues readiness
- revert setting started to zero in destroy_device
  drivers/net/vhost/rte_eth_vhost.c | 59 +++
  1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 11b6076..e392d71 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -528,10 +528,11 @@ update_queuing_status(struct rte_eth_dev *dev)
unsigned int i;
int allow_queuing = 1;
  
-	if (rte_atomic32_read(&internal->dev_attached) == 0)

+   if (!dev->data->rx_queues || !dev->data->tx_queues)
return;
  
-	if (rte_atomic32_read(&internal->started) == 0)

+   if (rte_atomic32_read(&internal->started) == 0 ||
+   rte_atomic32_read(&internal->dev_attached) == 0)
allow_queuing = 0;
  
  	/* Wait until rx/tx_pkt_burst stops accessing vhost device */

@@ -607,13 +608,10 @@ new_device(int vid)
  #endif
  
  	internal->vid = vid;

-   if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
+   if (rte_atomic32_read(&internal->started) == 1)
queue_setup(eth_dev, internal);
-   rte_atomic32_set(&internal->dev_attached, 1);
-   } else {
-   RTE_LOG(INFO, PMD, "RX/TX queues have not setup yet\n");
-   rte_atomic32_set(&internal->dev_attached, 0);
-   }
+   else
+   RTE_LOG(INFO, PMD, "RX/TX queues not exist yet\n");
  
  	for (i = 0; i < rte_vhost_get_vring_num(vid); i++)

rte_vhost_enable_guest_notification(vid, i, 0);
@@ -622,6 +620,7 @@ new_device(int vid)
  
  	eth_dev->data->dev_link.link_status = ETH_LINK_UP;
  
+	rte_atomic32_set(&internal->dev_attached, 1);

update_queuing_status(eth_dev);
  
  	RTE_LOG(INFO, PMD, "Vhost device %d created\n", vid);

@@ -651,23 +650,24 @@ destroy_device(int vid)
eth_dev = list->eth_dev;
internal = eth_dev->data->dev_private;
  
-	rte_atomic32_set(&internal->started, 0);

-   update_queuing_status(eth_dev);
rte_atomic32_set(&internal->dev_attached, 0);
+   update_queuing_status(eth_dev);
  
  	eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
  
-	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {

-   vq = eth_dev->data->rx_queues[i];
-   if (vq == NULL)
-   continue;
-   vq->vid = -1;
-   }
-   for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-   vq = eth_dev->data->tx_queues[i];
-   if (vq == NULL)
-   continue;
-   vq->vid = -1;
+   if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
+   for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+   vq = eth_dev->data->rx_queues[i];
+   if (!vq)
+   continue;
+   vq->vid = -1;
+   }
+   for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+   vq = eth_dev->data->tx_queues[i];
+   if (!vq)
+   continue;
+   vq->vid = -1;
+   }
}
  
  	state = vring_states[eth_dev->data->port_id];

@@ -792,11 +792,7 @@ eth_dev_start(struct rte_eth_dev *eth_dev)
  {
struct pmd_internal *internal = eth_dev->data->dev_private;
  
-	if (unlikely(rte_atomic32_read(&internal->dev_attached) == 0)) {

-   queue_setup(eth_dev, internal);
-   rte_atomic32_set(&internal->dev_attached, 1);
-   }
-
+   queue_setup(eth_dev, internal);
rte_atomic32_set(&internal->started, 1);
update_queuing_status(eth_dev);
  
@@ -836,10 +832,13 @@ eth_dev_close(struct rte_eth_dev *dev)

pthread_mutex_unlock(&internal_list_lock);
rte_free(list);
  
-	for (i = 0; i < dev->data->nb_rx_queues; i++)

-   rte_free(dev->data->rx_queues[i]);
-   for (i = 0; i < dev->data->nb_tx_queues; i++)
-   rte_free(dev->data->tx_queues[i]);
+   if (dev->data->rx_queues)


This is implied that rx_queues is already allocated. So I don't think we 
need this.



+   for (i = 0; i < dev->data->nb_r

[dpdk-dev] [PATCH v6 2/4] net/virtio: skip device probe in vdpa mode

2018-04-12 Thread Xiao Wang
If we want a virtio device to work in vDPA (vhost data path acceleration)
mode, we could add a "vdpa=1" devarg for this device to specify the mode.

This patch let virtio pmd skip device probe when detecting this parameter.

Signed-off-by: Xiao Wang 
Reviewed-by: Maxime Coquelin 
Reviewed-by: Ferruh Yigit 
---
 doc/guides/nics/virtio.rst | 13 
 drivers/net/virtio/virtio_ethdev.c | 43 ++
 2 files changed, 56 insertions(+)

diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst
index ca09cd203..8922f9c0b 100644
--- a/doc/guides/nics/virtio.rst
+++ b/doc/guides/nics/virtio.rst
@@ -318,3 +318,16 @@ Here we use l3fwd-power as an example to show how to get 
started.
 
 $ l3fwd-power -l 0-1 -- -p 1 -P --config="(0,0,1)" \
--no-numa --parse-ptype
+
+
+Virtio PMD arguments
+
+
+The user can specify below argument in devargs.
+
+#.  ``vdpa``:
+
+A virtio device could also be driven by vDPA (vhost data path acceleration)
+driver, and works as a HW vhost backend. This argument is used to specify
+a virtio device needs to work in vDPA mode.
+(Default: 0 (disabled))
diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 11f758929..6d6c50e89 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "virtio_ethdev.h"
 #include "virtio_pci.h"
@@ -1708,9 +1709,51 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
return 0;
 }
 
+static int vdpa_check_handler(__rte_unused const char *key,
+   const char *value, __rte_unused void *opaque)
+{
+   if (strcmp(value, "1"))
+   return -1;
+
+   return 0;
+}
+
+static int
+vdpa_mode_selected(struct rte_devargs *devargs)
+{
+   struct rte_kvargs *kvlist;
+   const char *key = "vdpa";
+   int ret = 0;
+
+   if (devargs == NULL)
+   return 0;
+
+   kvlist = rte_kvargs_parse(devargs->args, NULL);
+   if (kvlist == NULL)
+   return 0;
+
+   if (!rte_kvargs_count(kvlist, key))
+   goto exit;
+
+   /* vdpa mode selected when there's a key-value pair: vdpa=1 */
+   if (rte_kvargs_process(kvlist, key,
+   vdpa_check_handler, NULL) < 0) {
+   goto exit;
+   }
+   ret = 1;
+
+exit:
+   rte_kvargs_free(kvlist);
+   return ret;
+}
+
 static int eth_virtio_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
struct rte_pci_device *pci_dev)
 {
+   /* virtio pmd skips probe if device needs to work in vdpa mode */
+   if (vdpa_mode_selected(pci_dev->device.devargs))
+   return 1;
+
return rte_eth_dev_pci_generic_probe(pci_dev, sizeof(struct virtio_hw),
eth_virtio_dev_init);
 }
-- 
2.15.1



Re: [dpdk-dev] [PATCH v3] net/vhost: fix vhost invalid state

2018-04-12 Thread Maxime Coquelin



On 04/12/2018 09:21 AM, Tan, Jianfeng wrote:



On 4/12/2018 1:02 AM, Junjie Chen wrote:

dev_start sets *dev_attached* after setup queues, this sets device to
invalid state since no frontend is attached. Also destroy_device set
*started* to zero which makes *allow_queuing* always zero until dev_start
get called again. Actually, we should not determine queues existence by
*dev_attached* but by queues pointers or other separated variable(s).

Fixes: 30a701a53737 ("net/vhost: fix crash when creating vdev
dynamically")

Signed-off-by: Junjie Chen 
Tested-by: Jens Freimann 


Overall, looks great to me except a nit below.

Reviewed-by: Jianfeng Tan 


Thanks Jianfeng, I can handle the small change while applying.

Can you confirm that it is implied that the queue are already allocated,
else we wouldn't find the internal resource and quit earlier (in case of
eth_dev_close called twice for example)?

Thanks,
Maxime




---
Changes in v3:
- remove useless log in queue status showing
Changes in v2:
- use started to determine vhost queues readiness
- revert setting started to zero in destroy_device
  drivers/net/vhost/rte_eth_vhost.c | 59 
+++

  1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c

index 11b6076..e392d71 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -528,10 +528,11 @@ update_queuing_status(struct rte_eth_dev *dev)
  unsigned int i;
  int allow_queuing = 1;
-    if (rte_atomic32_read(&internal->dev_attached) == 0)
+    if (!dev->data->rx_queues || !dev->data->tx_queues)
  return;
-    if (rte_atomic32_read(&internal->started) == 0)
+    if (rte_atomic32_read(&internal->started) == 0 ||
+    rte_atomic32_read(&internal->dev_attached) == 0)
  allow_queuing = 0;
  /* Wait until rx/tx_pkt_burst stops accessing vhost device */
@@ -607,13 +608,10 @@ new_device(int vid)
  #endif
  internal->vid = vid;
-    if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
+    if (rte_atomic32_read(&internal->started) == 1)
  queue_setup(eth_dev, internal);
-    rte_atomic32_set(&internal->dev_attached, 1);
-    } else {
-    RTE_LOG(INFO, PMD, "RX/TX queues have not setup yet\n");
-    rte_atomic32_set(&internal->dev_attached, 0);
-    }
+    else
+    RTE_LOG(INFO, PMD, "RX/TX queues not exist yet\n");
  for (i = 0; i < rte_vhost_get_vring_num(vid); i++)
  rte_vhost_enable_guest_notification(vid, i, 0);
@@ -622,6 +620,7 @@ new_device(int vid)
  eth_dev->data->dev_link.link_status = ETH_LINK_UP;
+    rte_atomic32_set(&internal->dev_attached, 1);
  update_queuing_status(eth_dev);
  RTE_LOG(INFO, PMD, "Vhost device %d created\n", vid);
@@ -651,23 +650,24 @@ destroy_device(int vid)
  eth_dev = list->eth_dev;
  internal = eth_dev->data->dev_private;
-    rte_atomic32_set(&internal->started, 0);
-    update_queuing_status(eth_dev);
  rte_atomic32_set(&internal->dev_attached, 0);
+    update_queuing_status(eth_dev);
  eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
-    for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
-    vq = eth_dev->data->rx_queues[i];
-    if (vq == NULL)
-    continue;
-    vq->vid = -1;
-    }
-    for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-    vq = eth_dev->data->tx_queues[i];
-    if (vq == NULL)
-    continue;
-    vq->vid = -1;
+    if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
+    for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+    vq = eth_dev->data->rx_queues[i];
+    if (!vq)
+    continue;
+    vq->vid = -1;
+    }
+    for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+    vq = eth_dev->data->tx_queues[i];
+    if (!vq)
+    continue;
+    vq->vid = -1;
+    }
  }
  state = vring_states[eth_dev->data->port_id];
@@ -792,11 +792,7 @@ eth_dev_start(struct rte_eth_dev *eth_dev)
  {
  struct pmd_internal *internal = eth_dev->data->dev_private;
-    if (unlikely(rte_atomic32_read(&internal->dev_attached) == 0)) {
-    queue_setup(eth_dev, internal);
-    rte_atomic32_set(&internal->dev_attached, 1);
-    }
-
+    queue_setup(eth_dev, internal);
  rte_atomic32_set(&internal->started, 1);
  update_queuing_status(eth_dev);
@@ -836,10 +832,13 @@ eth_dev_close(struct rte_eth_dev *dev)
  pthread_mutex_unlock(&internal_list_lock);
  rte_free(list);
-    for (i = 0; i < dev->data->nb_rx_queues; i++)
-    rte_free(dev->data->rx_queues[i]);
-    for (i = 0; i < dev->data->nb_tx_queues; i++)
-    rte_free(dev->data->tx_queues[i]);
+    if (dev->data->rx_queues)


This is implied that rx_queues is already allocated. So I don't think we 
need this.



+    for (i = 0; i < dev->data->nb_rx_queues; i++)
+    rte_free(dev->da

[dpdk-dev] [PATCH v3 0/5] introduce new tunnel types

2018-04-12 Thread Xueming Li
v3:
- Change VXLAN-GPE definition order to avoid ABI compatibility issue.
v2:
- Split patch set into public and mlx5 two series, this one is the first.
v1:
- Support new tunnel type MPLS-in-GRE and MPLS-in-UDP
- Remove deprecation notes of rss level

This patchset introduced new tunnel type and related testpmd code:
- New tunnel type VXLAN-GPE
  https://datatracker.ietf.org/doc/draft-ietf-nvo3-vxlan-gpe/
- New tunnel type MPLS-in-GRE
  https://tools.ietf.org/html/rfc4023
- New tunnel type MPLS-in-UDP
  https://tools.ietf.org/html/rfc7510
- Support GRE extension in testpmd csum forwarding engine


Xueming Li (5):
  doc: remove RSS configuration change announcement
  ethdev: introduce new tunnel VXLAN-GPE
  ethdev: introduce tunnel type MPLS-in-GRE and MPLS-in-UDP
  app/testpmd: introduce new tunnel VXLAN-GPE
  app/testpmd: add more GRE extension support to csum engine

 app/test-pmd/cmdline_flow.c   |  24 
 app/test-pmd/config.c |   2 +
 app/test-pmd/csumonly.c   | 103 ++
 app/test-pmd/parameters.c |  12 +++-
 app/test-pmd/testpmd.h|   2 +
 doc/guides/prog_guide/rte_flow.rst|  12 
 doc/guides/rel_notes/deprecation.rst  |   4 --
 doc/guides/testpmd_app_ug/run_app.rst |   5 ++
 lib/librte_ether/rte_eth_ctrl.h   |   3 +-
 lib/librte_ether/rte_flow.c   |   1 +
 lib/librte_ether/rte_flow.h   |  27 +
 lib/librte_mbuf/rte_mbuf.c|   3 +
 lib/librte_mbuf/rte_mbuf.h|   1 +
 lib/librte_mbuf/rte_mbuf_ptype.c  |   3 +
 lib/librte_mbuf/rte_mbuf_ptype.h  |  47 
 lib/librte_net/rte_ether.h|  25 +
 16 files changed, 257 insertions(+), 17 deletions(-)

-- 
2.13.3



[dpdk-dev] [PATCH v3 1/5] doc: remove RSS configuration change announcement

2018-04-12 Thread Xueming Li
Remove deprecation as implementation of RSS level provided in Adrien's
patch set: http://www.dpdk.org/dev/patchwork/patch/37399/

Signed-off-by: Xueming Li 
Acked-by: Adrien Mazarguil 
---
 doc/guides/rel_notes/deprecation.rst | 4 
 1 file changed, 4 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index ec70b5fa9..8b8af47e3 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -108,10 +108,6 @@ Deprecation Notices
   Target release for removal of the legacy API will be defined once most
   PMDs have switched to rte_flow.
 
-* ethdev: A new rss level field planned in 18.05.
-  The new API add rss_level field to ``rte_eth_rss_conf`` to enable a choice
-  of RSS hash calculation on outer or inner header of tunneled packet.
-
 * ethdev:  Currently, if the  rte_eth_rx_burst() function returns a value less
   than *nb_pkts*, the application will assume that no more packets are present.
   Some of the hw queue based hardware can only support smaller burst for RX
-- 
2.13.3



[dpdk-dev] [PATCH v3 2/5] ethdev: introduce new tunnel VXLAN-GPE

2018-04-12 Thread Xueming Li
VXLAN-GPE enables VXLAN for all protocols. Protocol link:
https://www.ietf.org/id/draft-ietf-nvo3-vxlan-gpe-05.txt

Signed-off-by: Xueming Li 
---
 doc/guides/prog_guide/rte_flow.rst | 12 
 lib/librte_ether/rte_eth_ctrl.h|  3 ++-
 lib/librte_ether/rte_flow.c|  1 +
 lib/librte_ether/rte_flow.h| 27 +++
 lib/librte_mbuf/rte_mbuf.c |  3 +++
 lib/librte_mbuf/rte_mbuf.h |  1 +
 lib/librte_mbuf/rte_mbuf_ptype.c   |  1 +
 lib/librte_mbuf/rte_mbuf_ptype.h   | 13 +
 lib/librte_net/rte_ether.h | 25 +
 9 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index 91dbd61a0..9d92d4e1e 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1044,6 +1044,18 @@ Matches a GENEVE header.
 - ``rsvd1``: reserved, normally 0x00.
 - Default ``mask`` matches VNI only.
 
+Item: ``VXLAN-GPE``
+^^^
+
+Matches a VXLAN-GPE header (draft-ietf-nvo3-vxlan-gpe-05).
+
+- ``flags``: normally 0x0C (I and P flag).
+- ``rsvd0``: reserved, normally 0x.
+- ``protocol``: protocol type.
+- ``vni``: VXLAN network identifier.
+- ``rsvd1``: reserved, normally 0x00.
+- Default ``mask`` matches VNI only.
+
 Actions
 ~~~
 
diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 668f59acb..5ea8ae24c 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -54,7 +54,8 @@ extern "C" {
 #define RTE_ETH_FLOW_VXLAN  19 /**< VXLAN protocol based flow */
 #define RTE_ETH_FLOW_GENEVE 20 /**< GENEVE protocol based flow */
 #define RTE_ETH_FLOW_NVGRE  21 /**< NVGRE protocol based flow */
-#define RTE_ETH_FLOW_MAX22
+#define RTE_ETH_FLOW_VXLAN_GPE  22 /**< VXLAN-GPE protocol based flow 
*/
+#define RTE_ETH_FLOW_MAX23
 
 /**
  * Feature filter types
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
index 3d8116ebd..58ec80f42 100644
--- a/lib/librte_ether/rte_flow.c
+++ b/lib/librte_ether/rte_flow.c
@@ -55,6 +55,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = 
{
MK_FLOW_ITEM(E_TAG, sizeof(struct rte_flow_item_e_tag)),
MK_FLOW_ITEM(NVGRE, sizeof(struct rte_flow_item_nvgre)),
MK_FLOW_ITEM(GENEVE, sizeof(struct rte_flow_item_geneve)),
+   MK_FLOW_ITEM(VXLAN_GPE, sizeof(struct rte_flow_item_vxlan_gpe)),
 };
 
 /** Generate flow_action[] entry. */
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
index bed727df8..fefd69920 100644
--- a/lib/librte_ether/rte_flow.h
+++ b/lib/librte_ether/rte_flow.h
@@ -335,6 +335,13 @@ enum rte_flow_item_type {
 * See struct rte_flow_item_geneve.
 */
RTE_FLOW_ITEM_TYPE_GENEVE,
+
+   /**
+* Matches a VXLAN-GPE header (draft-ietf-nvo3-vxlan-gpe-05).
+*
+* See struct rte_flow_item_vxlan_gpe.
+*/
+   RTE_FLOW_ITEM_TYPE_VXLAN_GPE,
 };
 
 /**
@@ -864,6 +871,26 @@ static const struct rte_flow_item_geneve 
rte_flow_item_geneve_mask = {
 #endif
 
 /**
+ * RTE_FLOW_ITEM_TYPE_VXLAN_GPE.
+ *
+ * Matches a VXLAN-GPE header.
+ */
+struct rte_flow_item_vxlan_gpe {
+   uint8_t flags; /**< Normally 0x0c (I and P flag). */
+   uint8_t rsvd0[2]; /**< Reserved, normally 0x. */
+   uint8_t protocol; /**< Protocol type. */
+   uint8_t vni[3]; /**< VXLAN identifier. */
+   uint8_t rsvd1; /**< Reserved, normally 0x00. */
+};
+
+/** Default mask for RTE_FLOW_ITEM_TYPE_VXLAN_GPE. */
+#ifndef __cplusplus
+static const struct rte_flow_item_vxlan_gpe rte_flow_item_vxlan_gpe_mask = {
+   .vni = "\xff\xff\xff",
+};
+#endif
+
+/**
  * Matching pattern item definition.
  *
  * A pattern is formed by stacking items starting from the lowest protocol
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 091d388d3..dc90379e5 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -405,6 +405,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
case PKT_TX_TUNNEL_IPIP: return "PKT_TX_TUNNEL_IPIP";
case PKT_TX_TUNNEL_GENEVE: return "PKT_TX_TUNNEL_GENEVE";
case PKT_TX_TUNNEL_MPLSINUDP: return "PKT_TX_TUNNEL_MPLSINUDP";
+   case PKT_TX_TUNNEL_VXLAN_GPE: return "PKT_TX_TUNNEL_VXLAN_GPE";
case PKT_TX_MACSEC: return "PKT_TX_MACSEC";
case PKT_TX_SEC_OFFLOAD: return "PKT_TX_SEC_OFFLOAD";
default: return NULL;
@@ -439,6 +440,8 @@ rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t 
buflen)
  "PKT_TX_TUNNEL_NONE" },
{ PKT_TX_TUNNEL_MPLSINUDP, PKT_TX_TUNNEL_MASK,
  "PKT_TX_TUNNEL_NONE" },
+   { PKT_TX_TUNNEL_VXLAN_GPE, PKT_TX_TUNNEL_MASK,
+ "PKT_TX_TUNNEL_NONE" },
{ PKT_TX_MACSEC, PKT_TX_MACSEC, NULL },
{ PKT_TX_SEC_OFFLO

[dpdk-dev] [PATCH v3 5/5] app/testpmd: add more GRE extension support to csum engine

2018-04-12 Thread Xueming Li
This patch adds GRE checksum and sequence extension supports in addtion
to key extension to csum forwarding engine.

Signed-off-by: Xueming Li 
---
 app/test-pmd/csumonly.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index d98c51648..d32fb70e2 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -49,9 +49,12 @@
 #define IP_HDRLEN  0x05 /* default IP header length == five 32-bits words. */
 #define IP_VHL_DEF (IP_VERSION | IP_HDRLEN)
 
-#define GRE_KEY_PRESENT 0x2000
-#define GRE_KEY_LEN 4
-#define GRE_SUPPORTED_FIELDS GRE_KEY_PRESENT
+#define GRE_CHECKSUM_PRESENT   0x8000
+#define GRE_KEY_PRESENT0x2000
+#define GRE_SEQUENCE_PRESENT   0x1000
+#define GRE_EXT_LEN4
+#define GRE_SUPPORTED_FIELDS   (GRE_CHECKSUM_PRESENT | GRE_KEY_PRESENT |\
+GRE_SEQUENCE_PRESENT)
 
 /* We cannot use rte_cpu_to_be_16() on a constant in a switch/case */
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
@@ -269,14 +272,14 @@ parse_gre(struct simple_gre_hdr *gre_hdr, struct 
testpmd_offload_info *info)
struct ipv6_hdr *ipv6_hdr;
uint8_t gre_len = 0;
 
-   /* check which fields are supported */
-   if ((gre_hdr->flags & _htons(~GRE_SUPPORTED_FIELDS)) != 0)
-   return;
-
gre_len += sizeof(struct simple_gre_hdr);
 
if (gre_hdr->flags & _htons(GRE_KEY_PRESENT))
-   gre_len += GRE_KEY_LEN;
+   gre_len += GRE_EXT_LEN;
+   if (gre_hdr->flags & _htons(GRE_SEQUENCE_PRESENT))
+   gre_len += GRE_EXT_LEN;
+   if (gre_hdr->flags & _htons(GRE_CHECKSUM_PRESENT))
+   gre_len += GRE_EXT_LEN;
 
if (gre_hdr->proto == _htons(ETHER_TYPE_IPv4)) {
info->is_tunnel = 1;
@@ -815,6 +818,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 
/* step 3: fill the mbuf meta data (flags and header lengths) */
 
+   m->tx_offload = 0;
if (info.is_tunnel == 1) {
if (info.tunnel_tso_segsz ||
(tx_offloads &
-- 
2.13.3



[dpdk-dev] [PATCH v3 3/5] ethdev: introduce tunnel type MPLS-in-GRE and MPLS-in-UDP

2018-04-12 Thread Xueming Li
This patch adds new tunnel type for MPLS-in-GRE and MPLS-in-UDP.

MPLS-in-GRE protocol link:
https://tools.ietf.org/html/rfc4023

MPLS-in-UDP protocol link:
https://tools.ietf.org/html/rfc7510

Signed-off-by: Xueming Li 
Acked-by: Adrien Mazarguil 
---
 lib/librte_mbuf/rte_mbuf_ptype.c |  2 ++
 lib/librte_mbuf/rte_mbuf_ptype.h | 34 ++
 2 files changed, 36 insertions(+)

diff --git a/lib/librte_mbuf/rte_mbuf_ptype.c b/lib/librte_mbuf/rte_mbuf_ptype.c
index 49106c7df..10abfe89c 100644
--- a/lib/librte_mbuf/rte_mbuf_ptype.c
+++ b/lib/librte_mbuf/rte_mbuf_ptype.c
@@ -66,6 +66,8 @@ const char *rte_get_ptype_tunnel_name(uint32_t ptype)
case RTE_PTYPE_TUNNEL_ESP: return "TUNNEL_ESP";
case RTE_PTYPE_TUNNEL_L2TP: return "TUNNEL_L2TP";
case RTE_PTYPE_TUNNEL_VXLAN_GPE: return "TUNNEL_VXLAN_GPE";
+   case RTE_PTYPE_TUNNEL_MPLS_IN_UDP: return "TUNNEL_MPLS-IN-UDP";
+   case RTE_PTYPE_TUNNEL_MPLS_IN_GRE: return "TUNNEL_MPLS-IN-GRE";
default: return "TUNNEL_UNKNOWN";
}
 }
diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h b/lib/librte_mbuf/rte_mbuf_ptype.h
index 7caf83312..79ea31425 100644
--- a/lib/librte_mbuf/rte_mbuf_ptype.h
+++ b/lib/librte_mbuf/rte_mbuf_ptype.h
@@ -436,6 +436,40 @@ extern "C" {
  */
 #define RTE_PTYPE_TUNNEL_VXLAN_GPE  0xb000
 /**
+ * MPLS-in-GRE tunneling packet type (RFC 4023).
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=47
+ * | 'protocol'=0x8847>
+ * or,
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=47
+ * | 'protocol'=0x8848>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'protocol'=47
+ * | 'protocol'=0x8847>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=47
+ * | 'protocol'=0x8848>
+ */
+#define RTE_PTYPE_TUNNEL_MPLS_IN_GRE   0xc000
+/**
+ * MPLS-in-UDP tunneling packet type (RFC 7510).
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17
+ * | 'destination port'=6635>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17
+ * | 'destination port'=6635>
+ */
+#define RTE_PTYPE_TUNNEL_MPLS_IN_UDP  0xd000
+/**
  * Mask of tunneling packet types.
  */
 #define RTE_PTYPE_TUNNEL_MASK   0xf000
-- 
2.13.3



Re: [dpdk-dev] [PATCH v3] net/vhost: fix vhost invalid state

2018-04-12 Thread Chen, Junjie J
> 
> 
> 
> On 04/12/2018 09:21 AM, Tan, Jianfeng wrote:
> >
> >
> > On 4/12/2018 1:02 AM, Junjie Chen wrote:
> >> dev_start sets *dev_attached* after setup queues, this sets device to
> >> invalid state since no frontend is attached. Also destroy_device set
> >> *started* to zero which makes *allow_queuing* always zero until
> >> dev_start get called again. Actually, we should not determine queues
> >> existence by
> >> *dev_attached* but by queues pointers or other separated variable(s).
> >>
> >> Fixes: 30a701a53737 ("net/vhost: fix crash when creating vdev
> >> dynamically")
> >>
> >> Signed-off-by: Junjie Chen 
> >> Tested-by: Jens Freimann 
> >
> > Overall, looks great to me except a nit below.
> >
> > Reviewed-by: Jianfeng Tan 
> 
> Thanks Jianfeng, I can handle the small change while applying.
> 
> Can you confirm that it is implied that the queue are already allocated, else 
> we
> wouldn't find the internal resource and quit earlier (in case of eth_dev_close
> called twice for example)?

That is required, otherwise it generate segfault if we close device before 
queue setup. For example we
execute following steps in testpmd:
1. port attach
2. ctrl+D

> 
> Thanks,
> Maxime
> 
> >
> >> ---
> >> Changes in v3:
> >> - remove useless log in queue status showing Changes in v2:
> >> - use started to determine vhost queues readiness
> >> - revert setting started to zero in destroy_device
> >>   drivers/net/vhost/rte_eth_vhost.c | 59
> >> +++
> >>   1 file changed, 29 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> >> b/drivers/net/vhost/rte_eth_vhost.c
> >> index 11b6076..e392d71 100644
> >> --- a/drivers/net/vhost/rte_eth_vhost.c
> >> +++ b/drivers/net/vhost/rte_eth_vhost.c
> >> @@ -528,10 +528,11 @@ update_queuing_status(struct rte_eth_dev
> *dev)
> >>   unsigned int i;
> >>   int allow_queuing = 1;
> >> -    if (rte_atomic32_read(&internal->dev_attached) == 0)
> >> +    if (!dev->data->rx_queues || !dev->data->tx_queues)
> >>   return;
> >> -    if (rte_atomic32_read(&internal->started) == 0)
> >> +    if (rte_atomic32_read(&internal->started) == 0 ||
> >> +    rte_atomic32_read(&internal->dev_attached) == 0)
> >>   allow_queuing = 0;
> >>   /* Wait until rx/tx_pkt_burst stops accessing vhost device */
> >> @@ -607,13 +608,10 @@ new_device(int vid)
> >>   #endif
> >>   internal->vid = vid;
> >> -    if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
> >> +    if (rte_atomic32_read(&internal->started) == 1)
> >>   queue_setup(eth_dev, internal);
> >> -    rte_atomic32_set(&internal->dev_attached, 1);
> >> -    } else {
> >> -    RTE_LOG(INFO, PMD, "RX/TX queues have not setup yet\n");
> >> -    rte_atomic32_set(&internal->dev_attached, 0);
> >> -    }
> >> +    else
> >> +    RTE_LOG(INFO, PMD, "RX/TX queues not exist yet\n");
> >>   for (i = 0; i < rte_vhost_get_vring_num(vid); i++)
> >>   rte_vhost_enable_guest_notification(vid, i, 0); @@ -622,6
> >> +620,7 @@ new_device(int vid)
> >>   eth_dev->data->dev_link.link_status = ETH_LINK_UP;
> >> +    rte_atomic32_set(&internal->dev_attached, 1);
> >>   update_queuing_status(eth_dev);
> >>   RTE_LOG(INFO, PMD, "Vhost device %d created\n", vid); @@
> >> -651,23 +650,24 @@ destroy_device(int vid)
> >>   eth_dev = list->eth_dev;
> >>   internal = eth_dev->data->dev_private;
> >> -    rte_atomic32_set(&internal->started, 0);
> >> -    update_queuing_status(eth_dev);
> >>   rte_atomic32_set(&internal->dev_attached, 0);
> >> +    update_queuing_status(eth_dev);
> >>   eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
> >> -    for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> >> -    vq = eth_dev->data->rx_queues[i];
> >> -    if (vq == NULL)
> >> -    continue;
> >> -    vq->vid = -1;
> >> -    }
> >> -    for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> >> -    vq = eth_dev->data->tx_queues[i];
> >> -    if (vq == NULL)
> >> -    continue;
> >> -    vq->vid = -1;
> >> +    if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
> >> +    for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> >> +    vq = eth_dev->data->rx_queues[i];
> >> +    if (!vq)
> >> +    continue;
> >> +    vq->vid = -1;
> >> +    }
> >> +    for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> >> +    vq = eth_dev->data->tx_queues[i];
> >> +    if (!vq)
> >> +    continue;
> >> +    vq->vid = -1;
> >> +    }
> >>   }
> >>   state = vring_states[eth_dev->data->port_id];
> >> @@ -792,11 +792,7 @@ eth_dev_start(struct rte_eth_dev *eth_dev)
> >>   {
> >>   struct pmd_internal *internal = eth_dev->data->dev_private;
> >> -    if (unlikely(rte_atomic32_read(&internal->dev_attached) == 0)) {
> >> -    queue_setup(eth_dev, internal);
> >> -    rte_a

[dpdk-dev] [PATCH v3 4/5] app/testpmd: introduce new tunnel VXLAN-GPE

2018-04-12 Thread Xueming Li
Add VXLAN-GPE support to csum forwarding engine and rte flow.

Signed-off-by: Xueming Li 
---
 app/test-pmd/cmdline_flow.c   | 24 ++
 app/test-pmd/config.c |  2 +
 app/test-pmd/csumonly.c   | 83 +--
 app/test-pmd/parameters.c | 12 -
 app/test-pmd/testpmd.h|  2 +
 doc/guides/testpmd_app_ug/run_app.rst |  5 +++
 6 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index f85c1c57f..0d3c62599 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -154,6 +154,8 @@ enum index {
ITEM_GENEVE,
ITEM_GENEVE_VNI,
ITEM_GENEVE_PROTO,
+   ITEM_VXLAN_GPE,
+   ITEM_VXLAN_GPE_VNI,
 
/* Validate/create actions. */
ACTIONS,
@@ -470,6 +472,7 @@ static const enum index next_item[] = {
ITEM_GTPC,
ITEM_GTPU,
ITEM_GENEVE,
+   ITEM_VXLAN_GPE,
ZERO,
 };
 
@@ -626,6 +629,12 @@ static const enum index item_geneve[] = {
ZERO,
 };
 
+static const enum index item_vxlan_gpe[] = {
+   ITEM_VXLAN_GPE_VNI,
+   ITEM_NEXT,
+   ZERO,
+};
+
 static const enum index next_action[] = {
ACTION_END,
ACTION_VOID,
@@ -1560,6 +1569,21 @@ static const struct token token_list[] = {
.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_geneve,
 protocol)),
},
+   [ITEM_VXLAN_GPE] = {
+   .name = "vxlan-gpe",
+   .help = "match VXLAN-GPE header",
+   .priv = PRIV_ITEM(VXLAN_GPE,
+ sizeof(struct rte_flow_item_vxlan_gpe)),
+   .next = NEXT(item_vxlan_gpe),
+   .call = parse_vc,
+   },
+   [ITEM_VXLAN_GPE_VNI] = {
+   .name = "vni",
+   .help = "VXLAN-GPE identifier",
+   .next = NEXT(item_vxlan_gpe, NEXT_ENTRY(UNSIGNED), item_param),
+   .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_vxlan_gpe,
+vni)),
+   },
 
/* Validate/create actions. */
[ACTIONS] = {
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 4a273eff7..349eb9015 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -981,6 +981,7 @@ static const struct {
MK_FLOW_ITEM(GTPC, sizeof(struct rte_flow_item_gtp)),
MK_FLOW_ITEM(GTPU, sizeof(struct rte_flow_item_gtp)),
MK_FLOW_ITEM(GENEVE, sizeof(struct rte_flow_item_geneve)),
+   MK_FLOW_ITEM(VXLAN_GPE, sizeof(struct rte_flow_item_vxlan_gpe)),
 };
 
 /** Pattern item specification types. */
@@ -3082,6 +3083,7 @@ flowtype_to_str(uint16_t flow_type)
{"vxlan", RTE_ETH_FLOW_VXLAN},
{"geneve", RTE_ETH_FLOW_GENEVE},
{"nvgre", RTE_ETH_FLOW_NVGRE},
+   {"vxlan-gpe", RTE_ETH_FLOW_VXLAN_GPE},
};
 
for (i = 0; i < RTE_DIM(flowtype_str_table); i++) {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 5f5ab64aa..d98c51648 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -60,6 +60,8 @@
 #define _htons(x) (x)
 #endif
 
+uint16_t vxlan_gpe_udp_port = 4790;
+
 /* structure that caches offload info for the current packet */
 struct testpmd_offload_info {
uint16_t ethertype;
@@ -194,6 +196,70 @@ parse_vxlan(struct udp_hdr *udp_hdr,
info->l2_len += ETHER_VXLAN_HLEN; /* add udp + vxlan */
 }
 
+/* Parse a vxlan-gpe header */
+static void
+parse_vxlan_gpe(struct udp_hdr *udp_hdr,
+   struct testpmd_offload_info *info)
+{
+   struct ether_hdr *eth_hdr;
+   struct ipv4_hdr *ipv4_hdr;
+   struct ipv6_hdr *ipv6_hdr;
+   struct vxlan_gpe_hdr *vxlan_gpe_hdr;
+   uint8_t vxlan_gpe_len = sizeof(*vxlan_gpe_hdr);
+
+   /* Check udp destination port. */
+   if (udp_hdr->dst_port != _htons(vxlan_gpe_udp_port))
+   return;
+
+   vxlan_gpe_hdr = (struct vxlan_gpe_hdr *)((char *)udp_hdr +
+   sizeof(struct udp_hdr));
+
+   if (!vxlan_gpe_hdr->proto || vxlan_gpe_hdr->proto ==
+   VXLAN_GPE_TYPE_IPv4) {
+   info->is_tunnel = 1;
+   info->outer_ethertype = info->ethertype;
+   info->outer_l2_len = info->l2_len;
+   info->outer_l3_len = info->l3_len;
+   info->outer_l4_proto = info->l4_proto;
+
+   ipv4_hdr = (struct ipv4_hdr *)((char *)vxlan_gpe_hdr +
+  vxlan_gpe_len);
+
+   parse_ipv4(ipv4_hdr, info);
+   info->ethertype = _htons(ETHER_TYPE_IPv4);
+   info->l2_len = 0;
+
+   } else if (vxlan_gpe_hdr->proto == VXLAN_GPE_TYPE_IPv6) {
+   info->is_tunnel = 1;
+   info->outer_ethertype = info->ethertype;
+   info->outer_l2_len = info->l2_len;
+   info

Re: [dpdk-dev] [PATCH v3] net/vhost: fix vhost invalid state

2018-04-12 Thread Maxime Coquelin



On 04/12/2018 09:34 AM, Chen, Junjie J wrote:




On 04/12/2018 09:21 AM, Tan, Jianfeng wrote:



On 4/12/2018 1:02 AM, Junjie Chen wrote:

dev_start sets *dev_attached* after setup queues, this sets device to
invalid state since no frontend is attached. Also destroy_device set
*started* to zero which makes *allow_queuing* always zero until
dev_start get called again. Actually, we should not determine queues
existence by
*dev_attached* but by queues pointers or other separated variable(s).

Fixes: 30a701a53737 ("net/vhost: fix crash when creating vdev
dynamically")

Signed-off-by: Junjie Chen 
Tested-by: Jens Freimann 


Overall, looks great to me except a nit below.

Reviewed-by: Jianfeng Tan 


Thanks Jianfeng, I can handle the small change while applying.

Can you confirm that it is implied that the queue are already allocated, else we
wouldn't find the internal resource and quit earlier (in case of eth_dev_close
called twice for example)?


That is required, otherwise it generate segfault if we close device before 
queue setup. For example we
execute following steps in testpmd:
1. port attach
2. ctrl+D


Thanks for confirming Junjie, I will apply it as is then.

Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



Thanks,
Maxime




---
Changes in v3:
- remove useless log in queue status showing Changes in v2:
- use started to determine vhost queues readiness
- revert setting started to zero in destroy_device
   drivers/net/vhost/rte_eth_vhost.c | 59
+++
   1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c
b/drivers/net/vhost/rte_eth_vhost.c
index 11b6076..e392d71 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -528,10 +528,11 @@ update_queuing_status(struct rte_eth_dev

*dev)

   unsigned int i;
   int allow_queuing = 1;
-    if (rte_atomic32_read(&internal->dev_attached) == 0)
+    if (!dev->data->rx_queues || !dev->data->tx_queues)
   return;
-    if (rte_atomic32_read(&internal->started) == 0)
+    if (rte_atomic32_read(&internal->started) == 0 ||
+    rte_atomic32_read(&internal->dev_attached) == 0)
   allow_queuing = 0;
   /* Wait until rx/tx_pkt_burst stops accessing vhost device */
@@ -607,13 +608,10 @@ new_device(int vid)
   #endif
   internal->vid = vid;
-    if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
+    if (rte_atomic32_read(&internal->started) == 1)
   queue_setup(eth_dev, internal);
-    rte_atomic32_set(&internal->dev_attached, 1);
-    } else {
-    RTE_LOG(INFO, PMD, "RX/TX queues have not setup yet\n");
-    rte_atomic32_set(&internal->dev_attached, 0);
-    }
+    else
+    RTE_LOG(INFO, PMD, "RX/TX queues not exist yet\n");
   for (i = 0; i < rte_vhost_get_vring_num(vid); i++)
   rte_vhost_enable_guest_notification(vid, i, 0); @@ -622,6
+620,7 @@ new_device(int vid)
   eth_dev->data->dev_link.link_status = ETH_LINK_UP;
+    rte_atomic32_set(&internal->dev_attached, 1);
   update_queuing_status(eth_dev);
   RTE_LOG(INFO, PMD, "Vhost device %d created\n", vid); @@
-651,23 +650,24 @@ destroy_device(int vid)
   eth_dev = list->eth_dev;
   internal = eth_dev->data->dev_private;
-    rte_atomic32_set(&internal->started, 0);
-    update_queuing_status(eth_dev);
   rte_atomic32_set(&internal->dev_attached, 0);
+    update_queuing_status(eth_dev);
   eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
-    for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
-    vq = eth_dev->data->rx_queues[i];
-    if (vq == NULL)
-    continue;
-    vq->vid = -1;
-    }
-    for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-    vq = eth_dev->data->tx_queues[i];
-    if (vq == NULL)
-    continue;
-    vq->vid = -1;
+    if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
+    for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+    vq = eth_dev->data->rx_queues[i];
+    if (!vq)
+    continue;
+    vq->vid = -1;
+    }
+    for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+    vq = eth_dev->data->tx_queues[i];
+    if (!vq)
+    continue;
+    vq->vid = -1;
+    }
   }
   state = vring_states[eth_dev->data->port_id];
@@ -792,11 +792,7 @@ eth_dev_start(struct rte_eth_dev *eth_dev)
   {
   struct pmd_internal *internal = eth_dev->data->dev_private;
-    if (unlikely(rte_atomic32_read(&internal->dev_attached) == 0)) {
-    queue_setup(eth_dev, internal);
-    rte_atomic32_set(&internal->dev_attached, 1);
-    }
-
+    queue_setup(eth_dev, internal);
   rte_atomic32_set(&internal->started, 1);
   update_queuing_status(eth_dev); @@ -836,10 +832,13 @@
eth_dev_close(struct rte_eth_dev *dev)
   pthread_mutex_unlock(&internal_list_lock);
   rte_free(list);
-    for (i = 0; i < dev->data->nb_rx_queues

Re: [dpdk-dev] [PATCH v3] net/vhost: fix vhost invalid state

2018-04-12 Thread Tan, Jianfeng



On 4/12/2018 3:29 PM, Maxime Coquelin wrote:



On 04/12/2018 09:21 AM, Tan, Jianfeng wrote:



On 4/12/2018 1:02 AM, Junjie Chen wrote:

dev_start sets *dev_attached* after setup queues, this sets device to
invalid state since no frontend is attached. Also destroy_device set
*started* to zero which makes *allow_queuing* always zero until 
dev_start

get called again. Actually, we should not determine queues existence by
*dev_attached* but by queues pointers or other separated variable(s).

Fixes: 30a701a53737 ("net/vhost: fix crash when creating vdev
dynamically")

Signed-off-by: Junjie Chen 
Tested-by: Jens Freimann 


Overall, looks great to me except a nit below.

Reviewed-by: Jianfeng Tan 


Thanks Jianfeng, I can handle the small change while applying.

Can you confirm that it is implied that the queue are already allocated,
else we wouldn't find the internal resource and quit earlier (in case of
eth_dev_close called twice for example)?


I was referring to i40e_dev_free_queues() and ixgbe_dev_free_queues(). 
But a 2nd thought, no harm to keep the check.





Thanks,
Maxime




---
Changes in v3:
- remove useless log in queue status showing
Changes in v2:
- use started to determine vhost queues readiness
- revert setting started to zero in destroy_device
  drivers/net/vhost/rte_eth_vhost.c | 59 
+++

  1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c

index 11b6076..e392d71 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -528,10 +528,11 @@ update_queuing_status(struct rte_eth_dev *dev)
  unsigned int i;
  int allow_queuing = 1;
-if (rte_atomic32_read(&internal->dev_attached) == 0)
+if (!dev->data->rx_queues || !dev->data->tx_queues)
  return;
-if (rte_atomic32_read(&internal->started) == 0)
+if (rte_atomic32_read(&internal->started) == 0 ||
+rte_atomic32_read(&internal->dev_attached) == 0)
  allow_queuing = 0;
  /* Wait until rx/tx_pkt_burst stops accessing vhost device */
@@ -607,13 +608,10 @@ new_device(int vid)
  #endif
  internal->vid = vid;
-if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
+if (rte_atomic32_read(&internal->started) == 1)
  queue_setup(eth_dev, internal);
-rte_atomic32_set(&internal->dev_attached, 1);
-} else {
-RTE_LOG(INFO, PMD, "RX/TX queues have not setup yet\n");
-rte_atomic32_set(&internal->dev_attached, 0);
-}
+else
+RTE_LOG(INFO, PMD, "RX/TX queues not exist yet\n");
  for (i = 0; i < rte_vhost_get_vring_num(vid); i++)
  rte_vhost_enable_guest_notification(vid, i, 0);
@@ -622,6 +620,7 @@ new_device(int vid)
  eth_dev->data->dev_link.link_status = ETH_LINK_UP;
+rte_atomic32_set(&internal->dev_attached, 1);
  update_queuing_status(eth_dev);
  RTE_LOG(INFO, PMD, "Vhost device %d created\n", vid);
@@ -651,23 +650,24 @@ destroy_device(int vid)
  eth_dev = list->eth_dev;
  internal = eth_dev->data->dev_private;
-rte_atomic32_set(&internal->started, 0);
-update_queuing_status(eth_dev);
  rte_atomic32_set(&internal->dev_attached, 0);
+update_queuing_status(eth_dev);
  eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
-for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
-vq = eth_dev->data->rx_queues[i];
-if (vq == NULL)
-continue;
-vq->vid = -1;
-}
-for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-vq = eth_dev->data->tx_queues[i];
-if (vq == NULL)
-continue;
-vq->vid = -1;
+if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
+for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+vq = eth_dev->data->rx_queues[i];
+if (!vq)
+continue;
+vq->vid = -1;
+}
+for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+vq = eth_dev->data->tx_queues[i];
+if (!vq)
+continue;
+vq->vid = -1;
+}
  }
  state = vring_states[eth_dev->data->port_id];
@@ -792,11 +792,7 @@ eth_dev_start(struct rte_eth_dev *eth_dev)
  {
  struct pmd_internal *internal = eth_dev->data->dev_private;
-if (unlikely(rte_atomic32_read(&internal->dev_attached) == 0)) {
-queue_setup(eth_dev, internal);
-rte_atomic32_set(&internal->dev_attached, 1);
-}
-
+queue_setup(eth_dev, internal);
  rte_atomic32_set(&internal->started, 1);
  update_queuing_status(eth_dev);
@@ -836,10 +832,13 @@ eth_dev_close(struct rte_eth_dev *dev)
  pthread_mutex_unlock(&internal_list_lock);
  rte_free(list);
-for (i = 0; i < dev->data->nb_rx_queues; i++)
-rte_free(dev->data->rx_queues[i]);
-for (i = 0; i < dev->data->nb_tx_queues; i++)
-rte_free(dev->data->tx_queues[i]);
+if (dev->data->rx_queues)

Re: [dpdk-dev] [PATCH 1/3] net/szedata2: do not affect Ethernet interfaces

2018-04-12 Thread Matej Vido

On 11.04.2018 12:51, Ferruh Yigit wrote:

On 4/11/2018 10:36 AM, Matej Vido wrote:

On 10.04.2018 17:28, Ferruh Yigit wrote:

On 4/6/2018 3:12 PM, Matej Vido wrote:

NFB cards employ multiple Ethernet ports.
Until now, Ethernet port-related operations were performed on all of them
(since the whole card was represented as a single port).
With new NFB-200G2QL card, this is no longer viable.

Since there is no fixed mapping between the queues and Ethernet ports,
and since a single card can be represented as two ports in DPDK,
there is no way of telling which (if any) physical ports should be
associated with individual ports in DPDK.

This is also described in documentation in more detail.

Signed-off-by: Matej Vido 
Signed-off-by: Jan Remes 
---
   config/common_base |   5 -
   .../nics/img/szedata2_nfb200g_architecture.svg | 171 +++

Hi Matej,

This patch fails to apply [1], can you please confirm you can apply it?

[1]
$ git apply --check
dpdk-dev-1-3-net-szedata2-do-not-affect-Ethernet-interfaces.patch
error: corrupt patch at line 270

Hi Ferruh,

I've got same error on patch downloaded from patchwork. It seems that
the difference between the downloaded patch and the patch generated from
git is that the long lines in svg file are split into multiple lines in
the patch downloaded from patchwork. I suppose this could be the
problem. Any idea how to send a patch containing svg file correctly?

cc'ed Ogawa-san for support,

I remember he fixed similar issue in the past for spp, but I don't remember how?
Anyways I've hopefully fixed this by redrawing the image to avoid those 
long lines. I'm sending v2.


Thanks,
Matej



Thanks,
Matej





Re: [dpdk-dev] [PATCH v3] net/vhost: fix vhost invalid state

2018-04-12 Thread Maxime Coquelin



On 04/12/2018 09:35 AM, Maxime Coquelin wrote:



On 04/12/2018 09:34 AM, Chen, Junjie J wrote:




On 04/12/2018 09:21 AM, Tan, Jianfeng wrote:



On 4/12/2018 1:02 AM, Junjie Chen wrote:

dev_start sets *dev_attached* after setup queues, this sets device to
invalid state since no frontend is attached. Also destroy_device set
*started* to zero which makes *allow_queuing* always zero until
dev_start get called again. Actually, we should not determine queues
existence by
*dev_attached* but by queues pointers or other separated variable(s).

Fixes: 30a701a53737 ("net/vhost: fix crash when creating vdev
dynamically")

Signed-off-by: Junjie Chen 
Tested-by: Jens Freimann 


Overall, looks great to me except a nit below.

Reviewed-by: Jianfeng Tan 


Thanks Jianfeng, I can handle the small change while applying.

Can you confirm that it is implied that the queue are already 
allocated, else we
wouldn't find the internal resource and quit earlier (in case of 
eth_dev_close

called twice for example)?


That is required, otherwise it generate segfault if we close device 
before queue setup. For example we

execute following steps in testpmd:
1. port attach
2. ctrl+D


Thanks for confirming Junjie, I will apply it as is then.

Reviewed-by: Maxime Coquelin 



Applied to dpdk-next-virtio/master

Thanks,
Maxime


[dpdk-dev] [PATCH v2 0/3] net/szedata2: patch set for new card support

2018-04-12 Thread Matej Vido
This patch set adds support for new card NFB-200G2QL.

v2:
Rebased on top of dpdk-next-net/master (conflict in release notes
for patch 2).
Svg image in patch 1 replaced by redrawn image to avoid too long lines.

Matej Vido (3):
  net/szedata2: do not affect Ethernet interfaces
  net/szedata2: add support for new NIC
  net/szedata2: add kernel module dependency

 config/common_base |   5 -
 .../nics/img/szedata2_nfb200g_architecture.svg | 214 +++
 doc/guides/nics/szedata2.rst   |  66 +-
 doc/guides/rel_notes/release_18_05.rst |   4 +
 drivers/net/szedata2/Makefile  |   1 -
 drivers/net/szedata2/rte_eth_szedata2.c| 684 ++---
 drivers/net/szedata2/rte_eth_szedata2.h|   4 +-
 drivers/net/szedata2/szedata2_iobuf.c  | 174 --
 drivers/net/szedata2/szedata2_iobuf.h  | 327 --
 9 files changed, 710 insertions(+), 769 deletions(-)
 create mode 100644 doc/guides/nics/img/szedata2_nfb200g_architecture.svg
 delete mode 100644 drivers/net/szedata2/szedata2_iobuf.c
 delete mode 100644 drivers/net/szedata2/szedata2_iobuf.h

-- 
1.8.3.1



[dpdk-dev] [PATCH v2 2/3] net/szedata2: add support for new NIC

2018-04-12 Thread Matej Vido
This patch adds support for new NIC NFB-200G2QL.

At the probing stage numa nodes for the DMA queues are identified
and the appropriate number of ports is allocated.
DMA queues residing on the same numa node are grouped in the same
port.

Signed-off-by: Matej Vido 
---
v2:
Rebased on top of dpdk-next-net/master (conflict in release notes).
---
 doc/guides/rel_notes/release_18_05.rst  |   4 +
 drivers/net/szedata2/rte_eth_szedata2.c | 545 +---
 drivers/net/szedata2/rte_eth_szedata2.h |   4 +-
 3 files changed, 441 insertions(+), 112 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_05.rst 
b/doc/guides/rel_notes/release_18_05.rst
index 8c0414a..e07d9b6 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -69,6 +69,10 @@ New Features
   See the :doc:`../nics/axgbe` nic driver guide for more details on this
   new driver.
 
+* **Updated szedata2 PMD.**
+
+  Added support for new NFB-200G2QL card.
+
 
 API Changes
 ---
diff --git a/drivers/net/szedata2/rte_eth_szedata2.c 
b/drivers/net/szedata2/rte_eth_szedata2.c
index a9dc1c7..5a8f2ed 100644
--- a/drivers/net/szedata2/rte_eth_szedata2.c
+++ b/drivers/net/szedata2/rte_eth_szedata2.c
@@ -38,18 +38,53 @@
 
 #define SZEDATA2_DEV_PATH_FMT "/dev/szedataII%u"
 
+/**
+ * Format string for suffix used to differentiate between Ethernet ports
+ * on the same PCI device.
+ */
+#define SZEDATA2_ETH_DEV_NAME_SUFFIX_FMT "-port%u"
+
+/**
+ * Maximum number of ports for one device.
+ */
+#define SZEDATA2_MAX_PORTS 2
+
+/**
+ * Entry in list of PCI devices for this driver.
+ */
+struct pci_dev_list_entry;
+struct pci_dev_list_entry {
+   LIST_ENTRY(pci_dev_list_entry) next;
+   struct rte_pci_device *pci_dev;
+   unsigned int port_count;
+};
+
+/* List of PCI devices with number of ports for this driver. */
+LIST_HEAD(pci_dev_list, pci_dev_list_entry) szedata2_pci_dev_list =
+   LIST_HEAD_INITIALIZER(szedata2_pci_dev_list);
+
+struct port_info {
+   unsigned int rx_base_id;
+   unsigned int tx_base_id;
+   unsigned int rx_count;
+   unsigned int tx_count;
+   int numa_node;
+};
+
 struct pmd_internals {
struct rte_eth_dev *dev;
uint16_t max_rx_queues;
uint16_t max_tx_queues;
-   char sze_dev[PATH_MAX];
-   struct rte_mem_resource *pci_rsc;
+   unsigned int rxq_base_id;
+   unsigned int txq_base_id;
+   char *sze_dev_path;
 };
 
 struct szedata2_rx_queue {
struct pmd_internals *priv;
struct szedata *sze;
uint8_t rx_channel;
+   uint16_t qid;
uint16_t in_port;
struct rte_mempool *mb_pool;
volatile uint64_t rx_pkts;
@@ -61,6 +96,7 @@ struct szedata2_tx_queue {
struct pmd_internals *priv;
struct szedata *sze;
uint8_t tx_channel;
+   uint16_t qid;
volatile uint64_t tx_pkts;
volatile uint64_t tx_bytes;
volatile uint64_t err_pkts;
@@ -870,7 +906,7 @@ struct szedata2_tx_queue {
if (rxq->sze == NULL) {
uint32_t rx = 1 << rxq->rx_channel;
uint32_t tx = 0;
-   rxq->sze = szedata_open(internals->sze_dev);
+   rxq->sze = szedata_open(internals->sze_dev_path);
if (rxq->sze == NULL)
return -EINVAL;
ret = szedata_subscribe3(rxq->sze, &rx, &tx);
@@ -915,7 +951,7 @@ struct szedata2_tx_queue {
if (txq->sze == NULL) {
uint32_t rx = 0;
uint32_t tx = 1 << txq->tx_channel;
-   txq->sze = szedata_open(internals->sze_dev);
+   txq->sze = szedata_open(internals->sze_dev_path);
if (txq->sze == NULL)
return -EINVAL;
ret = szedata_subscribe3(txq->sze, &rx, &tx);
@@ -1179,12 +1215,15 @@ struct szedata2_tx_queue {
const struct rte_eth_rxconf *rx_conf __rte_unused,
struct rte_mempool *mb_pool)
 {
-   struct pmd_internals *internals = dev->data->dev_private;
struct szedata2_rx_queue *rxq;
int ret;
-   uint32_t rx = 1 << rx_queue_id;
+   struct pmd_internals *internals = dev->data->dev_private;
+   uint8_t rx_channel = internals->rxq_base_id + rx_queue_id;
+   uint32_t rx = 1 << rx_channel;
uint32_t tx = 0;
 
+   PMD_INIT_FUNC_TRACE();
+
if (dev->data->rx_queues[rx_queue_id] != NULL) {
eth_rx_queue_release(dev->data->rx_queues[rx_queue_id]);
dev->data->rx_queues[rx_queue_id] = NULL;
@@ -1200,7 +1239,7 @@ struct szedata2_tx_queue {
}
 
rxq->priv = internals;
-   rxq->sze = szedata_open(internals->sze_dev);
+   rxq->sze = szedata_open(internals->sze_dev_path);
if (rxq->sze == NULL) {
PMD_INIT_LOG(ERR, "szedata_open() failed for rx queue id "
"%" PRIu16 "!", rx_queue_id);
@@ -1214,7 +1253,8 @@ struct szedata2_t

[dpdk-dev] [PATCH v2 1/3] net/szedata2: do not affect Ethernet interfaces

2018-04-12 Thread Matej Vido
NFB cards employ multiple Ethernet ports.
Until now, Ethernet port-related operations were performed on all of them
(since the whole card was represented as a single port).
With new NFB-200G2QL card, this is no longer viable.

Since there is no fixed mapping between the queues and Ethernet ports,
and since a single card can be represented as two ports in DPDK,
there is no way of telling which (if any) physical ports should be
associated with individual ports in DPDK.

This is also described in documentation in more detail.

Signed-off-by: Matej Vido 
Signed-off-by: Jan Remes 
---
v2:
Rebased on top of dpdk-next-net/master.
Svg image replaced by redrawn image to avoid too long lines.
---
 config/common_base |   5 -
 .../nics/img/szedata2_nfb200g_architecture.svg | 214 ++
 doc/guides/nics/szedata2.rst   |  66 +++--
 drivers/net/szedata2/Makefile  |   1 -
 drivers/net/szedata2/rte_eth_szedata2.c| 137 +
 drivers/net/szedata2/szedata2_iobuf.c  | 174 ---
 drivers/net/szedata2/szedata2_iobuf.h  | 327 -
 7 files changed, 268 insertions(+), 656 deletions(-)
 create mode 100644 doc/guides/nics/img/szedata2_nfb200g_architecture.svg
 delete mode 100644 drivers/net/szedata2/szedata2_iobuf.c
 delete mode 100644 drivers/net/szedata2/szedata2_iobuf.h

diff --git a/config/common_base b/config/common_base
index 6c7e7fd..8d948c3 100644
--- a/config/common_base
+++ b/config/common_base
@@ -310,11 +310,6 @@ CONFIG_RTE_LIBRTE_SFC_EFX_DEBUG=n
 # Compile software PMD backed by SZEDATA2 device
 #
 CONFIG_RTE_LIBRTE_PMD_SZEDATA2=n
-#
-# Defines firmware type address space.
-# See documentation for supported values.
-# Other values raise compile time error.
-CONFIG_RTE_LIBRTE_PMD_SZEDATA2_AS=0
 
 #
 # Compile burst-oriented Cavium Thunderx NICVF PMD driver
diff --git a/doc/guides/nics/img/szedata2_nfb200g_architecture.svg 
b/doc/guides/nics/img/szedata2_nfb200g_architecture.svg
new file mode 100644
index 000..e152e4a
--- /dev/null
+++ b/doc/guides/nics/img/szedata2_nfb200g_architecture.svg
@@ -0,0 +1,214 @@
+
+http://purl.org/dc/elements/1.1/";
+   xmlns:cc="http://creativecommons.org/ns#";
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
+   xmlns:svg="http://www.w3.org/2000/svg";
+   xmlns="http://www.w3.org/2000/svg";
+   id="svg2"
+   stroke-miterlimit="10"
+   stroke-linecap="square"
+   stroke="none"
+   fill="none"
+   viewBox="0.0 0.0 568.7322834645669 352.3937007874016"
+   version="1.1">
+  
+
+  
+image/svg+xml
+http://purl.org/dc/dcmitype/StillImage"; />
+
+  
+
+  
+  
+  
+
+  
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ETH 0
+ETH 1
+NFB-200G2QL card
+PCI-E master slot
+PCI-E slave slot
+QUEUE 0
+QUEUE 15
+QUEUE 16
+QUEUE 31
+CPU 0
+CPU 1
+  
+
diff --git a/doc/guides/nics/szedata2.rst b/doc/guides/nics/szedata2.rst
index 4327e4e..1b4b3eb 100644
--- a/doc/guides/nics/szedata2.rst
+++ b/doc/guides/nics/szedata2.rst
@@ -43,8 +43,10 @@ separately:
 
 *  **Kernel modules**
 
+   * combo6core
* combov3
-   * szedata2_cv3
+   * szedata2
+   * szedata2_cv3 or szedata2_cv3_fdt
 
Kernel modules manage initialization of hardware, allocation and
sharing of resources for user space applications.
@@ -62,45 +64,53 @@ These configuration options can be modified before 
compilation in the
 
Value **y** enables compilation of szedata2 PMD.
 
-*  ``CONFIG_RTE_LIBRTE_PMD_SZEDATA2_AS`` default value: **0**
-
-   This option defines type of firmware address space and must be set
-   according to the used card and mode.
-   Currently supported values are:
-
-   * **0** - for cards (modes):
-
-  * NFB-100G1 (100G1)
+Using the SZEDATA2 PMD
+--
 
-   * **1** - for cards (modes):
+From DPDK version 16.04 the type of SZEDATA2 PMD is changed to PMD_PDEV.
+SZEDATA2 device is automatically recognized during EAL initialization.
+No special command line options are needed.
 
-  * NFB-100G2Q (100G1)
+Kernel modules have to be loaded before running the DPDK application.
 
-   * **2** - for cards (modes):
+NFB card architecture
+-
 
-  * NFB-40G2 (40G2)
-  * NFB-100G2C (100G2)
-  * NFB-100G2Q (40G2)
+The NFB cards are multi-port multi-queue cards, where (generally) data from any
+Ethernet port may be sent to any queue.
+They were historically represented in DPDK as a single port.
 
-   * **3** - for cards (modes):
+However, the new NFB-200G2QL card employs an addon cable which allows to 
connect
+it to two physical PCI-E slots at the same time (see the diagram below).
+This is done to allow 200 Gbps of traffic to be transferred through the PCI-E
+bus (note that a single PCI-E 3.0 x16 slot provides only 125 Gbps theoretical
+throughput).
 
-  * NFB-40G2 (10G8)
-

[dpdk-dev] [PATCH v2 3/3] net/szedata2: add kernel module dependency

2018-04-12 Thread Matej Vido
New kernel module dependency is required to support NFB-200G2QL card.

Signed-off-by: Matej Vido 
---
 drivers/net/szedata2/rte_eth_szedata2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/szedata2/rte_eth_szedata2.c 
b/drivers/net/szedata2/rte_eth_szedata2.c
index 5a8f2ed..d81b777 100644
--- a/drivers/net/szedata2/rte_eth_szedata2.c
+++ b/drivers/net/szedata2/rte_eth_szedata2.c
@@ -1917,7 +1917,7 @@ static int szedata2_eth_pci_remove(struct rte_pci_device 
*pci_dev)
 RTE_PMD_REGISTER_PCI(RTE_SZEDATA2_DRIVER_NAME, szedata2_eth_driver);
 RTE_PMD_REGISTER_PCI_TABLE(RTE_SZEDATA2_DRIVER_NAME, 
rte_szedata2_pci_id_table);
 RTE_PMD_REGISTER_KMOD_DEP(RTE_SZEDATA2_DRIVER_NAME,
-   "* combo6core & combov3 & szedata2 & szedata2_cv3");
+   "* combo6core & combov3 & szedata2 & ( szedata2_cv3 | szedata2_cv3_fdt 
)");
 
 RTE_INIT(szedata2_init_log);
 static void
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH V21 2/4] eal: add device event monitor framework

2018-04-12 Thread Thomas Monjalon
06/04/2018 05:55, Jeff Guo:
> v21->v20:

This is a very high number of revisions.
I cannot see them in my mail client because they are too much nested
and indented in the thread representation.
Tip: when sending a new revision, it is better to thread it with the
first revision, so we do not have an infinite nesting.


> --- a/doc/guides/rel_notes/release_18_05.rst
> +++ b/doc/guides/rel_notes/release_18_05.rst
> +* **Added device event monitor framework.**
> +
> +  Added a general device event monitor framework at EAL, for device dynamic 
> management.
> +  Such as device hotplug awareness and actions adopted accordingly. The list 
> of new APIs:
> +
> +  * ``rte_dev_event_monitor_start`` and ``rte_dev_event_monitor_stop`` are 
> for
> +the event monitor enable and disable.
> +  * ``rte_dev_event_callback_register`` and 
> ``rte_dev_event_callback_unregister``
> +are for the user's callbacks register and unregister.
>  
>  API Changes

Please keep 2 blank lines before the title.


> +/* The device event callback list for all registered callbacks. */
> +static struct dev_event_cb_list dev_event_cbs;
> +
> +/** @internal Structure to keep track of registered callbacks */
> +TAILQ_HEAD(dev_event_cb_list, dev_event_callback);

There is a compilation error with clang:

lib/librte_eal/common/eal_common_dev.c:37:33: fatal error:
tentative definition of variable with internal linkage
has incomplete non-array type
'struct dev_event_cb_list' 
[-Wtentative-definition-incomplete-type]
static struct dev_event_cb_list dev_event_cbs;
^


> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -258,5 +258,9 @@ EXPERIMENTAL {
>   rte_service_start_with_defaults;
>   rte_socket_count;
>   rte_socket_id_by_idx;
> + rte_dev_event_monitor_start;
> + rte_dev_event_monitor_stop;
> + rte_dev_event_callback_register;
> + rte_dev_event_callback_unregister;
>  
>  } DPDK_18.02;

Please keep the alphabetical order.




[dpdk-dev] [PATCH] maintainers: add backup maintainer for next-crypto tree

2018-04-12 Thread Akhil Goyal
Signed-off-by: Akhil Goyal 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e54c1f0..b46d04a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -45,6 +45,7 @@ T: git://dpdk.org/next/dpdk-next-virtio
 
 Next-crypto Tree
 M: Pablo de Lara 
+M: Akhil Goyal 
 T: git://dpdk.org/next/dpdk-next-crypto
 
 Next-eventdev Tree
-- 
2.9.3



[dpdk-dev] [PATCH v6 1/2] vhost: add support for interrupt mode

2018-04-12 Thread Junjie Chen
In some cases we want vhost dequeue work in interrupt mode to
release cpus to others when no data to transmit. So we install
interrupt handler of vhost device and interrupt vectors for each
rx queue when creating new backend according to vhost intrerupt
configuration. Thus, applications could register a epoll event fd
to associate rx queues with interrupt vectors.

Signed-off-by: Junjie Chen 
---
Changes in v6:
- rebase code to master
Changes in v5:
- update license to DPDK new license format
- rebase code to master 
Changes in v4:
- revert back license change
Changes in v3:
- handle failure in the middle of intr setup.
- use vhost API to enable interrupt.
- rebase to check rxq existence.
- update vhost API to support guest notification.
Changes in v2:
- update rx queue index.
- fill efd_counter_size for intr handler.
- update log.
 drivers/net/vhost/rte_eth_vhost.c | 158 +-
 lib/librte_vhost/vhost.c  |  14 ++--
 2 files changed, 163 insertions(+), 9 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index e392d71..8b4b716 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -520,6 +520,136 @@ find_internal_resource(char *ifname)
return list;
 }
 
+static int
+eth_rxq_intr_enable(struct rte_eth_dev *dev, uint16_t qid)
+{
+   struct rte_vhost_vring vring;
+   struct vhost_queue *vq;
+   int ret = 0;
+
+   vq = dev->data->rx_queues[qid];
+   if (!vq) {
+   RTE_LOG(ERR, PMD, "rxq%d is not setup yet\n", qid);
+   return -1;
+   }
+
+   ret = rte_vhost_get_vhost_vring(vq->vid, (qid << 1) + 1, &vring);
+   if (ret < 0) {
+   RTE_LOG(ERR, PMD, "Failed to get rxq%d's vring\n", qid);
+   return ret;
+   }
+   RTE_LOG(INFO, PMD, "Enable interrupt for rxq%d\n", qid);
+   rte_vhost_enable_guest_notification(vq->vid, (qid << 1) + 1, 1);
+   rte_wmb();
+
+   return ret;
+}
+
+static int
+eth_rxq_intr_disable(struct rte_eth_dev *dev, uint16_t qid)
+{
+   struct rte_vhost_vring vring;
+   struct vhost_queue *vq;
+   int ret = 0;
+
+   vq = dev->data->rx_queues[qid];
+   if (!vq) {
+   RTE_LOG(ERR, PMD, "rxq%d is not setup yet\n", qid);
+   return -1;
+   }
+
+   ret = rte_vhost_get_vhost_vring(vq->vid, (qid << 1) + 1, &vring);
+   if (ret < 0) {
+   RTE_LOG(ERR, PMD, "Failed to get rxq%d's vring", qid);
+   return ret;
+   }
+   RTE_LOG(INFO, PMD, "Disable interrupt for rxq%d\n", qid);
+   rte_vhost_enable_guest_notification(vq->vid, (qid << 1) + 1, 0);
+   rte_wmb();
+
+   return 0;
+}
+
+static void
+eth_vhost_uninstall_intr(struct rte_eth_dev *dev)
+{
+   struct rte_intr_handle *intr_handle = dev->intr_handle;
+
+   if (intr_handle) {
+   if (intr_handle->intr_vec)
+   free(intr_handle->intr_vec);
+   free(intr_handle);
+   }
+
+   dev->intr_handle = NULL;
+}
+
+static int
+eth_vhost_install_intr(struct rte_eth_dev *dev)
+{
+   struct rte_vhost_vring vring;
+   struct vhost_queue *vq;
+   int count = 0;
+   int nb_rxq = dev->data->nb_rx_queues;
+   int i;
+   int ret;
+
+   /* uninstall firstly if we are reconnecting */
+   if (dev->intr_handle)
+   eth_vhost_uninstall_intr(dev);
+
+   dev->intr_handle = malloc(sizeof(*dev->intr_handle));
+   if (!dev->intr_handle) {
+   RTE_LOG(ERR, PMD, "Fail to allocate intr_handle\n");
+   return -ENOMEM;
+   }
+   memset(dev->intr_handle, 0, sizeof(*dev->intr_handle));
+
+   dev->intr_handle->efd_counter_size = sizeof(uint64_t);
+
+   dev->intr_handle->intr_vec =
+   malloc(nb_rxq * sizeof(dev->intr_handle->intr_vec[0]));
+
+   if (!dev->intr_handle->intr_vec) {
+   RTE_LOG(ERR, PMD,
+   "Failed to allocate memory for interrupt vector\n");
+   free(dev->intr_handle);
+   return -ENOMEM;
+   }
+
+   RTE_LOG(INFO, PMD, "Prepare intr vec\n");
+   for (i = 0; i < nb_rxq; i++) {
+   vq = dev->data->rx_queues[i];
+   if (!vq) {
+   RTE_LOG(INFO, PMD, "rxq-%d not setup yet, skip!\n", i);
+   continue;
+   }
+
+   ret = rte_vhost_get_vhost_vring(vq->vid, (i << 1) + 1, &vring);
+   if (ret < 0) {
+   RTE_LOG(INFO, PMD,
+   "Failed to get rxq-%d's vring, skip!\n", i);
+   continue;
+   }
+
+   if (vring.kickfd < 0) {
+   RTE_LOG(INFO, PMD,
+   "rxq-%d's kickfd is invalid, skip!\n", i);
+   continue;
+   }
+   dev->intr_handle->intr_vec[i] = RTE_INTR_V

[dpdk-dev] [PATCH v6 2/2] net/vhost: update license to SPDX format

2018-04-12 Thread Junjie Chen
Update license to SPDX, also add Intel license.

Signed-off-by: Junjie Chen 
---
 drivers/net/vhost/rte_eth_vhost.c | 34 +++---
 1 file changed, 3 insertions(+), 31 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 8b4b716..c6b8637 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -1,34 +1,6 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright (c) 2016 IGEL Co., Ltd.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of IGEL Co.,Ltd. nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 IGEL Co., Ltd.
+ * Copyright(c) 2016-2018 Intel Corporation
  */
 #include 
 #include 
-- 
2.0.1



Re: [dpdk-dev] [PATCH v2 4/4] ether: add packet modification aciton in flow API

2018-04-12 Thread Zhang, Qi Z
Hi Adrien

> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> Sent: Thursday, April 12, 2018 3:04 PM
> To: Zhang, Qi Z 
> Cc: dev@dpdk.org; Doherty, Declan ; Chandran,
> Sugesh ; Glynn, Michael J
> ; Liu, Yu Y ; Ananyev,
> Konstantin ; Richardson, Bruce
> 
> Subject: Re: [PATCH v2 4/4] ether: add packet modification aciton in flow API
> 
> On Sun, Apr 01, 2018 at 05:19:22PM -0400, Qi Zhang wrote:
> > Add new actions that be used to modify packet content with generic
> > semantic:
> >
> > RTE_FLOW_ACTION_TYPE_FIELD_UPDATE:
> > - update specific field of packet
> > RTE_FLWO_ACTION_TYPE_FIELD_INCREMENT:
> > - increament specific field of packet
> > RTE_FLWO_ACTION_TYPE_FIELD_DECREMENT:
> > - decreament specific field of packet
> > RTE_FLWO_ACTION_TYPE_FIELD_COPY:
> > - copy data from one field to another in packet.
> >
> > All action use struct rte_flow_item parameter to match the pattern
> > that going to be modified, if no pattern match, the action just be
> > skipped.
> 
> That's not good. It must result in undefined behavior, more about that below.

I may not get your point, see my below comment.

> 
> > These action are non-terminating action. they will not impact the fate
> > of the packets.
> >
> > Signed-off-by: Qi Zhang 
> 
> Noticed a few typos above and in subject line ("aciton", "FLWO", "increament",
> "decreament").
> 
> Note that I'm usually against using rte_flow_item structures and associated
> enum values inside action lists because it could be seen as inconsistent from
> an API standpoint. On the other hand, reusing existing types is a good thing 
> so
> let's go with that for now.
> 
> Please see inline comments.
> 
> > ---
> >  doc/guides/prog_guide/rte_flow.rst | 89
> ++
> >  lib/librte_ether/rte_flow.h| 57 
> >  2 files changed, 146 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > index aa5c818..6628964 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -1508,6 +1508,95 @@ Representor.
> > | ``port_id``  | identification of the destination |
> > +--+---+
> >
> > +Action: ``FILED_UPDATE``
> > +^^^
> 
> FILED => FIELD
> 
> Underline is also shorter than title and might cause documentation warnings.
> 
> > +
> > +Update specific field of the packet.
> > +
> > +- Non-terminating by default.
> 
> These statements are not needed since "ethdev: alter behavior of flow API
> actions" [1].
> 
> [1] http://dpdk.org/ml/archives/dev/2018-April/096527.html
> 
> > +
> > +.. _table_rte_flow_action_field_update:
> > +
> > +.. table:: FIELD_UPDATE
> > +
> > +   +---+-+
> > +   | Field | Value
> |
> > +
> +===+
> =+
> > +   | ``item``  | item->type: specify the pattern to modify
> |
> > +   |   | item->spec: specify the new value to update
> |
> > +   |   | item->mask: specify which part of the pattern to update
> |
> > +   |   | item->last: ignored
> |
> 
> This table needs to be divided a bit more with one cell per field for better
> clarity. See other pattern item definitions such as "Item: ``RAW``" for an
> example.
> 
> > +   +---+-+
> > +   | ``layer`` | 0 means outermost matched pattern,
> |
> > +   |   | 1 means next-to-outermost and so on ...
> |
> > +
> > + +---+---
> > + --+
> 
> What does "layer" refer to by the way? The layer described on the pattern side
> of the flow rule, the actual protocol layer matched inside traffic, or is 
> "item"
> actually an END-terminated list of items (as suggested by "pattern" in above
> documentation)?
> 
> I suspect the intent is for layer to have the same definition as RSS 
> encapulation
> level ("ethdev: add encap level to RSS flow API action" [2]), and item points 
> to
> a single item, correct?

Yes
> 
> In that case, it's misleading, please rename it "level". Also keep in mind you
> can't make an action rely on anything found on the pattern side of a flow 
> rule.
> 
OK, "Level" looks better.
Also I may not get your point here. please correct me, 
My understanding is, all the modification action of a flow is independent of 
patterns of the same flow, 
For example when define a flow with pattern = eth/ipv4 and with a TCP 
modification action.
all ipv4 packets will hit that flow, and go to the same destination, but only 
TCP packet will be modified
otherwise, the action is just skipped,

> What happens when this action is attempted on non-matching traffic must be
> documented here as well. Refer to discussion re "ethdev: Add tunnel
> encap/decap actions" [3]. To b

Re: [dpdk-dev] Survey for final decision about per-port offload API

2018-04-12 Thread Shreyansh Jain

On Friday 30 March 2018 07:17 PM, Thomas Monjalon wrote:

There are some discussions about a specific part of the offload API:
"To enable per-port offload, the offload should be set on both
device configuration and queue setup."

It means the application must repeat the port offload flags
in rte_eth_conf.[rt]xmode.offloads and rte_eth_[rt]xconf.offloads,
when calling respectively rte_eth_dev_configure() and
rte_eth_[rt]x_queue_setup for each queue.

The PMD must check if there is mismatch, i.e. a port offload not
repeated in queue setup.
There is a proposal to do this check at ethdev level:
http://dpdk.org/ml/archives/dev/2018-March/094023.html

It was also proposed to relax the API and allow "forgetting" port
offloads in queue offloads:
http://dpdk.org/ml/archives/dev/2018-March/092978.html

It would mean the offloads applied to a queue result of OR operation:
rte_eth_conf.[rt]xmode.offloads | rte_eth_[rt]xconf.offloads



With respect to DPAA and DPAA2 PMDs:


1/ Do you agree with above API change?


Yes




If we agree with this change, we need to update the documentation
and remove the checks in PMDs.
Note: no matter what is decided here, 18.05-rc1 should have all PMDs
switched to the API which was defined in 17.11.
Given that API is new and not yet adopted by the applications,
the sonner it is fixed, the better.

2/ Should we do this change in 18.05-rc2?


Yes




At the same time, we want to make clear that an offload enabled at
port level, cannot be disabled at queue level.

3/ Do you agree with above statement (to be added in the doc)?


Yes




There is the same kind of confusion in the offload capabilities:
rte_eth_dev_info.[rt]x_offload_capa
rte_eth_dev_info.[rt]x_queue_offload_capa
The queue capabilities must be a subset of port capabilities,
i.e. every queue capabilities must be reported as port capabilities.
But the port capabilities should be reported at queue level
only if it can be applied to a specific queue.

4/ Do you agree with above statement (to be added in the doc)?


Yes




Please give your opinion on questions 1, 2, 3 and 4.
Answering by yes/no may be sufficient in most cases :)
Thank you






[dpdk-dev] [PATCH v6 2/2] net/vhost: update license to SPDX format

2018-04-12 Thread Junjie Chen
Update license to SPDX, also add Intel license.

Signed-off-by: Junjie Chen 
---
 drivers/net/vhost/rte_eth_vhost.c | 34 +++---
 drivers/net/vhost/rte_eth_vhost.h | 35 +++
 2 files changed, 6 insertions(+), 63 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 8b4b716..c6b8637 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -1,34 +1,6 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright (c) 2016 IGEL Co., Ltd.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of IGEL Co.,Ltd. nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 IGEL Co., Ltd.
+ * Copyright(c) 2016-2018 Intel Corporation
  */
 #include 
 #include 
diff --git a/drivers/net/vhost/rte_eth_vhost.h 
b/drivers/net/vhost/rte_eth_vhost.h
index 948f3c8..0e68b9f 100644
--- a/drivers/net/vhost/rte_eth_vhost.h
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -1,36 +1,7 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2016 IGEL Co., Ltd.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of IGEL Co., Ltd. nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 IGEL Co., Ltd.
+ * Copyright(c) 2016-2018 Intel Corporation
  */
-
 #ifndef _RTE_ETH_VHOST_H_
 #define _RTE_ETH_VHOST_H_
 
-- 
2.0.1



Re: [dpdk-dev] [PATCH v2 01/15] net/mlx5: support 16 hardware priorities

2018-04-12 Thread Nélio Laranjeiro
On Tue, Apr 10, 2018 at 03:22:46PM +, Xueming(Steven) Li wrote:
> Hi Nelio,
> 
> > -Original Message-
> > From: Nélio Laranjeiro 
> > Sent: Tuesday, April 10, 2018 10:42 PM
> > To: Xueming(Steven) Li 
> > Cc: Shahaf Shuler ; dev@dpdk.org
> > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
> > 
> > On Tue, Apr 10, 2018 at 09:34:01PM +0800, Xueming Li wrote:
> > > Adjust flow priority mapping to adapt new hardware 16 verb flow
> > > priorites support:
> > > 0-3: RTE FLOW tunnel rule
> > > 4-7: RTE FLOW non-tunnel rule
> > > 8-15: PMD control flow
> > 
> > This commit log is inducing people in error, this amount of priority
> > depends on the Mellanox OFED installed, it is not available on upstream
> > Linux kernel yet nor in the current Mellanox OFED GA.
> > 
> > What happens when those amount of priority are not available, is it
> > removing a functionality?  Will it collide with other flows?
> 
> If 16  priorities not available, simply behavior as 8 priorities.

It is not described in the commit log, please add it.

> > > Signed-off-by: Xueming Li 

> > >   },
> > >   [HASH_RXQ_ETH] = {
> > >   .hash_fields = 0,
> > >   .dpdk_rss_hf = 0,
> > > - .flow_priority = 3,
> > > + .flow_priority = 2,
> > >   },
> > >  };
> > 
> > If the amount of priorities remains 8, you are removing the priority for
> > the tunnel flows introduced by commit 749365717f5c ("net/mlx5: change
> > tunnel flow priority")
> > 
> > Please keep this functionality when this patch fails to get the expected
> > 16 Verbs priorities.
> 
> These priority shift are different in 16 priorities scenario, I changed it
> to calculation. In function mlx5_flow_priorities_detect(), priority shift 
> will be 1 if 8 priorities, 4 in case of 16 priorities. Please refer to changes
> in function mlx5_flow_update_priority() as well.

Please light my lamp, I don't see it...
 

> > >  static void
> > > -mlx5_flow_update_priority(struct mlx5_flow_parse *parser,
> > > +mlx5_flow_update_priority(struct rte_eth_dev *dev,
> > > +   struct mlx5_flow_parse *parser,
> > > const struct rte_flow_attr *attr)  {
> > > + struct priv *priv = dev->data->dev_private;
> > >   unsigned int i;
> > > + uint16_t priority;
> > >
> > > + if (priv->config.flow_priority_shift == 1)
> > > + priority = attr->priority * MLX5_VERBS_FLOW_PRIO_4;
> > > + else
> > > + priority = attr->priority * MLX5_VERBS_FLOW_PRIO_8;
> > > + if (!parser->inner)
> > > + priority += priv->config.flow_priority_shift;
> > >   if (parser->drop) {
> > > - parser->queue[HASH_RXQ_ETH].ibv_attr->priority =
> > > - attr->priority +
> > > - hash_rxq_init[HASH_RXQ_ETH].flow_priority;
> > > + parser->queue[HASH_RXQ_ETH].ibv_attr->priority = priority +
> > > + hash_rxq_init[HASH_RXQ_ETH].flow_priority;
> > >   return;
> > >   }
> > >   for (i = 0; i != hash_rxq_init_n; ++i) {
> > > - if (parser->queue[i].ibv_attr) {
> > > - parser->queue[i].ibv_attr->priority =
> > > - attr->priority +
> > > - hash_rxq_init[i].flow_priority -
> > > - (parser->inner ? 1 : 0);
> > > - }
> > > + if (!parser->queue[i].ibv_attr)
> > > + continue;
> > > + parser->queue[i].ibv_attr->priority = priority +
> > > + hash_rxq_init[i].flow_priority;

Previous code was subtracting one from the table priorities which was
starting at 1.  In the new code I don't see it.

What I am missing?

> > >   }
> > >  }
> > >
> > > @@ -1087,7 +1097,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> > >   .layer = HASH_RXQ_ETH,
> > >   .mark_id = MLX5_FLOW_MARK_DEFAULT,
> > >   };
> > > - ret = mlx5_flow_convert_attributes(attr, error);
> > > + ret = mlx5_flow_convert_attributes(dev, attr, error);
> > >   if (ret)
> > >   return ret;
> > >   ret = mlx5_flow_convert_actions(dev, actions, error, parser); @@
> > > -1158,7 +1168,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> > >*/
> > >   if (!parser->drop)
> > >   mlx5_flow_convert_finalise(parser);
> > > - mlx5_flow_update_priority(parser, attr);
> > > + mlx5_flow_update_priority(dev, parser, attr);
> > >  exit_free:
> > >   /* Only verification is expected, all resources should be released.
> > */
> > >   if (!parser->create) {
> > > @@ -2450,7 +2460,7 @@ mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
> > >   struct priv *priv = dev->data->dev_private;
> > >   const struct rte_flow_attr attr = {
> > >   .ingress = 1,
> > > - .priority = MLX5_CTRL_FLOW_PRIORITY,
> > > + .priority = priv->config.control_flow_priority,
> > >   };
> > >   struct rte_flow_item items[] = {
> > >   {
> > > @@ -3161,3 +3171,50 @@ mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
> > >   }
> > >   return 0;
> > >  }
> 

Re: [dpdk-dev] [PATCH v2 3/4] ether: add more protocol support in flow API

2018-04-12 Thread Adrien Mazarguil
On Thu, Apr 12, 2018 at 05:12:08AM +, Zhang, Qi Z wrote:
> Hi Adrien:
> 
>   Thank you so much for your careful review and helpful suggestions!
>   I agree with most of your comments, except couple question about 
> RTE_FLOW_ITEM_TYPE_TGT_ADDR and RTE_FLOW_ITEM_IPV6_EXT_HDR
>   Please see my comment inline.
> 
> Thanks!
> Qi

Thanks, replying inline also.

> > -Original Message-
> > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > Sent: Thursday, April 12, 2018 12:32 AM
> > To: Zhang, Qi Z 
> > Cc: dev@dpdk.org; Doherty, Declan ; Chandran,
> > Sugesh ; Glynn, Michael J
> > ; Liu, Yu Y ; Ananyev,
> > Konstantin ; Richardson, Bruce
> > 
> > Subject: Re: [PATCH v2 3/4] ether: add more protocol support in flow API
> > 
> > On Sun, Apr 01, 2018 at 05:19:21PM -0400, Qi Zhang wrote:
> > > Add new protocol header match support as below
> > >
> > > RTE_FLOW_ITEM_TYPE_ARP
> > >   - match IPv4 ARP header
> > > RTE_FLOW_ITEM_TYPE_EXT_HDR_ANY
> > >   - match any IPv6 extension header
> > 
> > While properly defined in the patch, "IPV6" is missing here.
> > 
> > > RTE_FLOW_ITEM_TYPE_ICMPV6
> > >   - match IPv6 ICMP header
> > > RTE_FLOW_ITEM_TYPE_ICMPV6_TGT_ADDR
> > >   - match IPv6 ICMP Target address
> > > RTE_FLOW_ITEM_TYPE_ICMPV6_SSL
> > >   - match IPv6 ICMP Source Link-layer address
> > > RTE_FLOW_ITEM_TYPE_ICMPV6_TTL
> > >   - match IPv6 ICMP Target Link-layer address
> > >
> > > Signed-off-by: Qi Zhang 
> > 
> > First, since they are added at the end of enum rte_flow_item_type, no ABI
> > breakage notice is necessary.
> > 
> > However testpmd implementation [1][2] and documentation update [3][4] are
> > mandatory for all new pattern items and actions.
> 
> OK, will add this into v2.
> 
> > 
> > More comments below regarding these definitions.
> > 
> > [1] flow_item[] in app/test-pmd/config.c [2] using ITEM_ICMP as an example
> > in app/test-pmd/cmdline_flow.c [3] "Pattern items" section in
> > doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > [4] using "Item: ``ICMP``" section as an example in
> > doc/guides/prog_guide/rte_flow.rst
> > 
> > > ---
> > >  lib/librte_ether/rte_flow.h | 160
> > > 
> > >  1 file changed, 160 insertions(+)
> > >
> > > diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
> > > index 8f75db0..a8ec780 100644
> > > --- a/lib/librte_ether/rte_flow.h
> > > +++ b/lib/librte_ether/rte_flow.h
> > > @@ -323,6 +323,49 @@ enum rte_flow_item_type {
> > >* See struct rte_flow_item_geneve.
> > >*/
> > >   RTE_FLOW_ITEM_TYPE_GENEVE,
> > > +
> > > + /**
> > > +  * Matches ARP IPv4 header.
> > 
> > => Matches an IPv4 ARP header.
> > 
> > > +  *
> > > +  * See struct rte_flow_item_arp.
> > > +  */
> > > + RTE_FLOW_ITEM_TYPE_ARP,
> > 
> > While you're right to make "IPv4" clear since ARP is also used for other
> > protocols DPDK doesn't support (and likely never will), the ARP header has
> > both a fixed and a variably-sized part.
> > 
> > Ideally an ARP pattern item should match the fixed part only and a separate
> > ARP_IPV4 match its payload, somewhat like you did for ICMPv6/NDP below.
> > 
> > Problem is that in DPDK, struct arp_hdr includes struct arp_ipv4, so one
> > suggestion would be to rename this pattern item ARP_IPV4 directly:
> > 
> > => RTE_FLOW_ITEM_TYPE_ARP_IPV4
> > 
> > > +
> > > + /**
> > > +  * Matches any IPv6 Extension header.
> > 
> > => Matches an IPv6 extension header.
> > 
> > > +  *
> > > +  * See struct rte_flow_item_ipv6_ext_any.
> > > +  */
> > > + RTE_FLOW_ITEM_TYPE_IPV6_EXT_HDR_ANY,
> > 
> > I'm not sure this definition is necessary, more below about that.
> > 
> > Also I don't see a benefit in having "ANY" part of the name, if you want to 
> > keep
> > it, I suggest the simpler:
> > 
> > => RTE_FLOW_ITEM_TYPE_IPV6_EXT
> > 
> > > +
> > > + /**
> > > +  * Matches ICMPv6 header.
> > 
> > => Matches an ICMPv6 header.
> > 
> > > +  *
> > > +  * See struct rte_flow_item_icmpv6
> > 
> > Missing "."
> > 
> > > +  */
> > > + RTE_FLOW_ITEM_TYPE_ICMPV6,
> > > +
> > 
> > Before entering NDP territory below, I understand those should be stacked on
> > top of RTE_FLOW_ITEM_TYPE_ICMPV6. It's fine but for clarity they should be
> > named after the NDP types they represent, not inner data fields.
> > 
> > Also I think we should consider NDP as a protocol sitting on top of ICMPv6. 
> > We
> > could therefore drop "ICMP" from these definitions.
> > 
> > Since "ND" is a common shorthand for this protocol and "6" another when
> > doing something related to IPv6, I suggest to use "ND6" to name he related
> > pattern items.
> 
> I agree.
> 
> > 
> > These are the reasons behind my next suggestions:
> > 
> > > + /**
> > > +  * Match ICMPv6 target address.
> > > +  *
> > > +  * See struct rte_flow_item_icmpv6_tgt_addr.
> > > +  */
> > > + RTE_FLOW_ITEM_TYPE_ICMPV6_TGT_ADDR,
> > 
> > => Matches an IPv6 network discovery router solicitation.
> > => See struct rte_flow_item_nd6_rs.
> > => RTE

Re: [dpdk-dev] [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification

2018-04-12 Thread Nélio Laranjeiro
On Wed, Apr 11, 2018 at 08:11:50AM +, Xueming(Steven) Li wrote:
> Hi Nelio,
> 
> > -Original Message-
> > From: Nélio Laranjeiro 
> > Sent: Tuesday, April 10, 2018 11:17 PM
> > To: Xueming(Steven) Li 
> > Cc: Shahaf Shuler ; dev@dpdk.org
> > Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> > identification
> > 
> > On Tue, Apr 10, 2018 at 09:34:04PM +0800, Xueming Li wrote:
> > > This patch introduced tunnel type identification based on flow rules.
> > > If flows of multiple tunnel types built on same queue,
> > > RTE_PTYPE_TUNNEL_MASK will be returned, bits in flow mark could be
> > > used as tunnel type identifier.
> > 
> > I don't see anywhere in this patch where the bits are reserved to identify
> > a flow, nor values which can help to identify it.
> > 
> > Is this missing?
> > 
> > Anyway we have already very few bits in the mark making it difficult to be
> > used by the user, reserving again some to may lead to remove the mark
> > support from the flows.
> 
> Not all users will use multiple tunnel types, this is not included in this 
> patch
> set and left to user decision. I'll update comments to make this clear.

Thanks,

> > > Signed-off-by: Xueming Li 

> > >  /**
> > > + * RXQ update after flow rule creation.
> > > + *
> > > + * @param dev
> > > + *   Pointer to Ethernet device.
> > > + * @param flow
> > > + *   Pointer to the flow rule.
> > > + */
> > > +static void
> > > +mlx5_flow_create_update_rxqs(struct rte_eth_dev *dev, struct rte_flow
> > > +*flow) {
> > > + struct priv *priv = dev->data->dev_private;
> > > + unsigned int i;
> > > +
> > > + if (!dev->data->dev_started)
> > > + return;
> > > + for (i = 0; i != flow->rss_conf.queue_num; ++i) {
> > > + struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
> > > +  [(*flow->queues)[i]];
> > > + struct mlx5_rxq_ctrl *rxq_ctrl =
> > > + container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
> > > + uint8_t tunnel = PTYPE_IDX(flow->tunnel);
> > > +
> > > + rxq_data->mark |= flow->mark;
> > > + if (!tunnel)
> > > + continue;
> > > + rxq_ctrl->tunnel_types[tunnel] += 1;
> > 
> > I don't understand why you need such array, the NIC is unable to return
> > the tunnel type has it returns only one bit saying tunnel.
> > Why don't it store in the priv structure the current configured tunnel?
> 
> This array is used to count tunnel types bound to queue, if only one tunnel 
> type,
> ptype will report that tunnel type, TUNNEL MASK(max value) will be returned 
> if 
> multiple types bound to a queue.
> 
> Flow rss action specifies queues that binding to tunnel, thus we can't assume
> all queues have same tunnel types, so this is a per queue structure.

There is something I am missing here, how in the dataplane the PMD can
understand from 1 bit which kind of tunnel the packet is matching?


> > > @@ -2334,9 +2414,9 @@ mlx5_flow_stop(struct rte_eth_dev *dev, struct
> > > mlx5_flows *list)  {
> > >   struct priv *priv = dev->data->dev_private;
> > >   struct rte_flow *flow;
> > > + unsigned int i;
> > >
> > >   TAILQ_FOREACH_REVERSE(flow, list, mlx5_flows, next) {
> > > - unsigned int i;
> > >   struct mlx5_ind_table_ibv *ind_tbl = NULL;
> > >
> > >   if (flow->drop) {
> > > @@ -2382,6 +2462,16 @@ mlx5_flow_stop(struct rte_eth_dev *dev, struct
> > mlx5_flows *list)
> > >   DRV_LOG(DEBUG, "port %u flow %p removed", dev->data->port_id,
> > >   (void *)flow);
> > >   }
> > > + /* Cleanup Rx queue tunnel info. */
> > > + for (i = 0; i != priv->rxqs_n; ++i) {
> > > + struct mlx5_rxq_data *q = (*priv->rxqs)[i];
> > > + struct mlx5_rxq_ctrl *rxq_ctrl =
> > > + container_of(q, struct mlx5_rxq_ctrl, rxq);
> > > +
> > > + memset((void *)rxq_ctrl->tunnel_types, 0,
> > > +sizeof(rxq_ctrl->tunnel_types));
> > > + q->tunnel = 0;
> > > + }
> > >  }
> > 
> > This hunk does not handle the fact the Rx queue array may have some holes
> > i.e. the application is allowed to ask for 10 queues and only initialise
> > some.  In such situation this code will segfault.
> 
> In other words, "q" could be NULL, correct? I'll add check for this.

Correct.

> BTW, there should be an action item to add such check in rss/queue flow 
> creation.

As it is the responsibility of the application/user to make rule according
to what it has configured, it has not been added.  It can still be
added, but it cannot be considered as a fix.

> > It should only memset the Rx queues making part of the flow not the others.
> 
> Clean this(decrease tunnel_types counter of each queue) from each flow would 
> be time 
> consuming.

Considering flows are already relying on syscall to communicate with
the kernel, the extra cycles consumption to only clear the queues making
part of this flow is neglectable.  

By the way in the same function the mark is clear

Re: [dpdk-dev] [PATCH v2 3/4] ether: add more protocol support in flow API

2018-04-12 Thread Zhang, Qi Z


> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> Sent: Thursday, April 12, 2018 5:20 PM
> To: Zhang, Qi Z 
> Cc: dev@dpdk.org; Doherty, Declan ; Chandran,
> Sugesh ; Glynn, Michael J
> ; Liu, Yu Y ; Ananyev,
> Konstantin ; Richardson, Bruce
> 
> Subject: Re: [PATCH v2 3/4] ether: add more protocol support in flow API
> 
> On Thu, Apr 12, 2018 at 05:12:08AM +, Zhang, Qi Z wrote:
> > Hi Adrien:
> >
> > Thank you so much for your careful review and helpful suggestions!
> > I agree with most of your comments, except couple question about
> RTE_FLOW_ITEM_TYPE_TGT_ADDR and RTE_FLOW_ITEM_IPV6_EXT_HDR
> > Please see my comment inline.
> >
> > Thanks!
> > Qi
> 
> Thanks, replying inline also.
> 
> > > -Original Message-
> > > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > > Sent: Thursday, April 12, 2018 12:32 AM
> > > To: Zhang, Qi Z 
> > > Cc: dev@dpdk.org; Doherty, Declan ;
> > > Chandran, Sugesh ; Glynn, Michael J
> > > ; Liu, Yu Y ;
> > > Ananyev, Konstantin ; Richardson,
> > > Bruce 
> > > Subject: Re: [PATCH v2 3/4] ether: add more protocol support in flow
> > > API
> > >
> > > On Sun, Apr 01, 2018 at 05:19:21PM -0400, Qi Zhang wrote:
> > > > Add new protocol header match support as below
> > > >
> > > > RTE_FLOW_ITEM_TYPE_ARP
> > > > - match IPv4 ARP header
> > > > RTE_FLOW_ITEM_TYPE_EXT_HDR_ANY
> > > > - match any IPv6 extension header
> > >
> > > While properly defined in the patch, "IPV6" is missing here.
> > >
> > > > RTE_FLOW_ITEM_TYPE_ICMPV6
> > > > - match IPv6 ICMP header
> > > > RTE_FLOW_ITEM_TYPE_ICMPV6_TGT_ADDR
> > > > - match IPv6 ICMP Target address
> > > > RTE_FLOW_ITEM_TYPE_ICMPV6_SSL
> > > > - match IPv6 ICMP Source Link-layer address
> > > > RTE_FLOW_ITEM_TYPE_ICMPV6_TTL
> > > > - match IPv6 ICMP Target Link-layer address
> > > >
> > > > Signed-off-by: Qi Zhang 
> > >
> > > First, since they are added at the end of enum rte_flow_item_type,
> > > no ABI breakage notice is necessary.
> > >
> > > However testpmd implementation [1][2] and documentation update
> > > [3][4] are mandatory for all new pattern items and actions.
> >
> > OK, will add this into v2.
> >
> > >
> > > More comments below regarding these definitions.
> > >
> > > [1] flow_item[] in app/test-pmd/config.c [2] using ITEM_ICMP as an
> > > example in app/test-pmd/cmdline_flow.c [3] "Pattern items" section
> > > in doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > > [4] using "Item: ``ICMP``" section as an example in
> > > doc/guides/prog_guide/rte_flow.rst
> > >
> > > > ---
> > > >  lib/librte_ether/rte_flow.h | 160
> > > > 
> > > >  1 file changed, 160 insertions(+)
> > > >
> > > > diff --git a/lib/librte_ether/rte_flow.h
> > > > b/lib/librte_ether/rte_flow.h index 8f75db0..a8ec780 100644
> > > > --- a/lib/librte_ether/rte_flow.h
> > > > +++ b/lib/librte_ether/rte_flow.h
> > > > @@ -323,6 +323,49 @@ enum rte_flow_item_type {
> > > >  * See struct rte_flow_item_geneve.
> > > >  */
> > > > RTE_FLOW_ITEM_TYPE_GENEVE,
> > > > +
> > > > +   /**
> > > > +* Matches ARP IPv4 header.
> > >
> > > => Matches an IPv4 ARP header.
> > >
> > > > +*
> > > > +* See struct rte_flow_item_arp.
> > > > +*/
> > > > +   RTE_FLOW_ITEM_TYPE_ARP,
> > >
> > > While you're right to make "IPv4" clear since ARP is also used for
> > > other protocols DPDK doesn't support (and likely never will), the
> > > ARP header has both a fixed and a variably-sized part.
> > >
> > > Ideally an ARP pattern item should match the fixed part only and a
> > > separate
> > > ARP_IPV4 match its payload, somewhat like you did for ICMPv6/NDP
> below.
> > >
> > > Problem is that in DPDK, struct arp_hdr includes struct arp_ipv4, so
> > > one suggestion would be to rename this pattern item ARP_IPV4 directly:
> > >
> > > => RTE_FLOW_ITEM_TYPE_ARP_IPV4
> > >
> > > > +
> > > > +   /**
> > > > +* Matches any IPv6 Extension header.
> > >
> > > => Matches an IPv6 extension header.
> > >
> > > > +*
> > > > +* See struct rte_flow_item_ipv6_ext_any.
> > > > +*/
> > > > +   RTE_FLOW_ITEM_TYPE_IPV6_EXT_HDR_ANY,
> > >
> > > I'm not sure this definition is necessary, more below about that.
> > >
> > > Also I don't see a benefit in having "ANY" part of the name, if you
> > > want to keep it, I suggest the simpler:
> > >
> > > => RTE_FLOW_ITEM_TYPE_IPV6_EXT
> > >
> > > > +
> > > > +   /**
> > > > +* Matches ICMPv6 header.
> > >
> > > => Matches an ICMPv6 header.
> > >
> > > > +*
> > > > +* See struct rte_flow_item_icmpv6
> > >
> > > Missing "."
> > >
> > > > +*/
> > > > +   RTE_FLOW_ITEM_TYPE_ICMPV6,
> > > > +
> > >
> > > Before entering NDP territory below, I understand those should be
> > > stacked on top of RTE_FLOW_ITEM_TYPE_ICMPV6. It's fine but for
> > > clarity they should

Re: [dpdk-dev] [PATCH v2 4/4] ether: add packet modification aciton in flow API

2018-04-12 Thread Adrien Mazarguil
On Thu, Apr 12, 2018 at 08:50:14AM +, Zhang, Qi Z wrote:
> Hi Adrien
> 
> > -Original Message-
> > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > Sent: Thursday, April 12, 2018 3:04 PM
> > To: Zhang, Qi Z 
> > Cc: dev@dpdk.org; Doherty, Declan ; Chandran,
> > Sugesh ; Glynn, Michael J
> > ; Liu, Yu Y ; Ananyev,
> > Konstantin ; Richardson, Bruce
> > 
> > Subject: Re: [PATCH v2 4/4] ether: add packet modification aciton in flow 
> > API
> > 
> > On Sun, Apr 01, 2018 at 05:19:22PM -0400, Qi Zhang wrote:
> > > Add new actions that be used to modify packet content with generic
> > > semantic:
> > >
> > > RTE_FLOW_ACTION_TYPE_FIELD_UPDATE:
> > >   - update specific field of packet
> > > RTE_FLWO_ACTION_TYPE_FIELD_INCREMENT:
> > >   - increament specific field of packet
> > > RTE_FLWO_ACTION_TYPE_FIELD_DECREMENT:
> > >   - decreament specific field of packet
> > > RTE_FLWO_ACTION_TYPE_FIELD_COPY:
> > >   - copy data from one field to another in packet.
> > >
> > > All action use struct rte_flow_item parameter to match the pattern
> > > that going to be modified, if no pattern match, the action just be
> > > skipped.
> > 
> > That's not good. It must result in undefined behavior, more about that 
> > below.
> 
> I may not get your point, see my below comment.
> 
> > 
> > > These action are non-terminating action. they will not impact the fate
> > > of the packets.
> > >
> > > Signed-off-by: Qi Zhang 
> > 
> > Noticed a few typos above and in subject line ("aciton", "FLWO", 
> > "increament",
> > "decreament").
> > 
> > Note that I'm usually against using rte_flow_item structures and associated
> > enum values inside action lists because it could be seen as inconsistent 
> > from
> > an API standpoint. On the other hand, reusing existing types is a good 
> > thing so
> > let's go with that for now.
> > 
> > Please see inline comments.
> > 
> > > ---
> > >  doc/guides/prog_guide/rte_flow.rst | 89
> > ++
> > >  lib/librte_ether/rte_flow.h| 57 
> > >  2 files changed, 146 insertions(+)
> > >
> > > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > > b/doc/guides/prog_guide/rte_flow.rst
> > > index aa5c818..6628964 100644
> > > --- a/doc/guides/prog_guide/rte_flow.rst
> > > +++ b/doc/guides/prog_guide/rte_flow.rst
> > > @@ -1508,6 +1508,95 @@ Representor.
> > > | ``port_id``  | identification of the destination |
> > > +--+---+
> > >
> > > +Action: ``FILED_UPDATE``
> > > +^^^
> > 
> > FILED => FIELD
> > 
> > Underline is also shorter than title and might cause documentation warnings.
> > 
> > > +
> > > +Update specific field of the packet.
> > > +
> > > +- Non-terminating by default.
> > 
> > These statements are not needed since "ethdev: alter behavior of flow API
> > actions" [1].
> > 
> > [1] http://dpdk.org/ml/archives/dev/2018-April/096527.html
> > 
> > > +
> > > +.. _table_rte_flow_action_field_update:
> > > +
> > > +.. table:: FIELD_UPDATE
> > > +
> > > +   
> > > +---+-+
> > > +   | Field | Value
> > |
> > > +
> > +===+
> > =+
> > > +   | ``item``  | item->type: specify the pattern to modify
> > |
> > > +   |   | item->spec: specify the new value to update
> > |
> > > +   |   | item->mask: specify which part of the pattern to update
> > |
> > > +   |   | item->last: ignored
> > |
> > 
> > This table needs to be divided a bit more with one cell per field for better
> > clarity. See other pattern item definitions such as "Item: ``RAW``" for an
> > example.
> > 
> > > +   
> > > +---+-+
> > > +   | ``layer`` | 0 means outermost matched pattern,
> > |
> > > +   |   | 1 means next-to-outermost and so on ...
> > |
> > > +
> > > + +---+---
> > > + --+
> > 
> > What does "layer" refer to by the way? The layer described on the pattern 
> > side
> > of the flow rule, the actual protocol layer matched inside traffic, or is 
> > "item"
> > actually an END-terminated list of items (as suggested by "pattern" in above
> > documentation)?
> > 
> > I suspect the intent is for layer to have the same definition as RSS 
> > encapulation
> > level ("ethdev: add encap level to RSS flow API action" [2]), and item 
> > points to
> > a single item, correct?
> 
> Yes
> > 
> > In that case, it's misleading, please rename it "level". Also keep in mind 
> > you
> > can't make an action rely on anything found on the pattern side of a flow 
> > rule.
> > 
> OK, "Level" looks better.
> Also I may not get your point here. please correct me, 
> My understanding is, all the modification action of a flow is independent of 
> patterns of the same flow, 
> For example when define a flow with pattern

[dpdk-dev] [PATCHv2] linuxapp eal: set fd to -1 for MAP_ANONYMOUS cases

2018-04-12 Thread Neil Horman
https://dpdk.org/tracker/show_bug.cgi?id=18

Indicated that several mmap call sites in the [linux|bsd]app eal code
set fd that was not -1 in their calls while using MAP_ANONYMOUS.  While
probably not a huge deal, the man page does say the fd should be -1 for
portability, as some implementations don't ignore fd as they should for
MAP_ANONYMOUS.

Signed-off-by: Neil Horman 
CC: Thomas Monjalon 
CC: Ferruh Yigit 

---
Change notes

v2) Rebased to HEAD again to adjust for patches that landed ahead of
this
---
 lib/librte_eal/bsdapp/eal/eal_memory.c   | 2 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c 
b/lib/librte_eal/bsdapp/eal/eal_memory.c
index b27262c7e..a5e034789 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -70,7 +70,7 @@ rte_eal_hugepage_init(void)
 
addr = mmap(NULL, internal_config.memory,
PROT_READ | PROT_WRITE,
-   MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
strerror(errno));
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 7cdd3048e..b7a2e951d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1329,7 +1329,7 @@ eal_legacy_hugepage_init(void)
}
 
addr = mmap(NULL, internal_config.memory, PROT_READ | 
PROT_WRITE,
-   MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
strerror(errno));
-- 
2.14.3



Re: [dpdk-dev] [PATCH v5 02/21] eal: list acceptable init priorities

2018-04-12 Thread Neil Horman
On Wed, Apr 11, 2018 at 02:04:03AM +0200, Gaetan Rivet wrote:
> Build a central list to quickly see each used priorities for
> constructors, allowing to verify that they are both above 100 and in the
> proper order.
> 
> Signed-off-by: Gaetan Rivet 
> Acked-by: Neil Horman 
> Acked-by: Shreyansh Jain 
> ---
>  lib/librte_eal/common/eal_common_log.c | 2 +-
>  lib/librte_eal/common/include/rte_bus.h| 2 +-
>  lib/librte_eal/common/include/rte_common.h | 8 +++-
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_log.c 
> b/lib/librte_eal/common/eal_common_log.c
> index a27192620..36b9d6e08 100644
> --- a/lib/librte_eal/common/eal_common_log.c
> +++ b/lib/librte_eal/common/eal_common_log.c
> @@ -260,7 +260,7 @@ static const struct logtype logtype_strings[] = {
>  };
>  
>  /* Logging should be first initializer (before drivers and bus) */
> -RTE_INIT_PRIO(rte_log_init, 101);
> +RTE_INIT_PRIO(rte_log_init, LOG);
>  static void
>  rte_log_init(void)
>  {
> diff --git a/lib/librte_eal/common/include/rte_bus.h 
> b/lib/librte_eal/common/include/rte_bus.h
> index 6fb08341a..eb9eded4e 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -325,7 +325,7 @@ enum rte_iova_mode rte_bus_get_iommu_class(void);
>   * The constructor has higher priority than PMD constructors.
>   */
>  #define RTE_REGISTER_BUS(nm, bus) \
> -RTE_INIT_PRIO(businitfn_ ##nm, 110); \
> +RTE_INIT_PRIO(businitfn_ ##nm, BUS); \
>  static void businitfn_ ##nm(void) \
>  {\
>   (bus).name = RTE_STR(nm);\
> diff --git a/lib/librte_eal/common/include/rte_common.h 
> b/lib/librte_eal/common/include/rte_common.h
> index 6c5bc5a76..8f04518f7 100644
> --- a/lib/librte_eal/common/include/rte_common.h
> +++ b/lib/librte_eal/common/include/rte_common.h
> @@ -81,6 +81,12 @@ typedef uint16_t unaligned_uint16_t;
>   */
>  #define RTE_SET_USED(x) (void)(x)
>  
> +#define RTE_PRIORITY_LOG 101
> +#define RTE_PRIORITY_BUS 110
> +
> +#define RTE_PRIO(prio) \
> + RTE_PRIORITY_ ## prio
> +
>  /**
>   * Run function before main() with low priority.
>   *
> @@ -102,7 +108,7 @@ static void __attribute__((constructor, used)) func(void)
>   *   Lowest number is the first to run.
>   */
>  #define RTE_INIT_PRIO(func, prio) \
> -static void __attribute__((constructor(prio), used)) func(void)
> +static void __attribute__((constructor(RTE_PRIO(prio)), used)) func(void)
>  
It just occured to me, that perhaps you should add a RTE_PRORITY_LAST priority,
and redefine RTE_INIT to RTE_INIT_PRIO(func, RTE_PRIORITY_LAST) for clarity.  I
presume that constructors with no explicit priority run last, but the gcc
manual doesn't explicitly say that.  It would be a heck of a bug to track down
if somehow unprioritized constructors ran early.

Neil

>  /**
>   * Force a function to be inlined
> -- 
> 2.11.0
> 
> 


Re: [dpdk-dev] [PATCH 1/2] net/tap: add tun support

2018-04-12 Thread Ophir Munk
Hi Vipin,
This patch (adding TUN to TAP) has been Acked and accepted in next-net branch.
I have some questions regarding the implementation (please find below).

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Vipin Varghese
> Sent: Tuesday, April 03, 2018 12:38 AM
> To: dev@dpdk.org; pascal.ma...@6wind.com; ferruh.yi...@intel.com
> Cc: Vipin Varghese 
> Subject: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> 
> The change adds functional TUN PMD logic to the existing TAP PMD.
> TUN PMD can be initialized with 'net_tunX' where 'X' represents unique id.
> PMD supports argument interface, while MAC address and remote are not
> supported.
> 

[...]

> 
> + /*
> +  * TUN and TAP are created with IFF_NO_PI disabled.
> +  * For TUN PMD this mandatory as fields are used by
> +  * Kernel tun.c to determine whether its IP or non IP
> +  * packets.
> +  *
> +  * The logic fetches the first byte of data from mbuf.
> +  * compares whether its v4 or v6. If none matches default
> +  * value 0x00 is taken for protocol field.
> +  */
> + char *buff_data = rte_pktmbuf_mtod(seg, void *);
> + j = (*buff_data & 0xf0);
> + if (j & (0x40 | 0x60))
> + pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
> +

1. Accessing the first byte here assumes it is the first IP header byte (layer 
3) which is correct for TUN.
For TAP however the first byte belongs to Ethernet destination address (layer 
2). 
Please explain how this logic will work for TAP.

2. If the first TUN byte contains 0x2X (which is neither IPv4 nor IPv6) it will 
end up by setting ip.proto as 0xdd86. 
Please explain how this logic will work for non-IP packets in TUN



Re: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to improve zero copy performance

2018-04-12 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Junjie Chen
> Sent: Thursday, April 12, 2018 6:32 AM
> To: Xing, Beilei ; Zhang, Qi Z 
> Cc: dev@dpdk.org; Chen, Junjie J ; c...@dpdk.org
> Subject: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to improve 
> zero copy performance
> 
> From: "Chen, Junjie" 
> 
> When vhost backend works in dequeue zero copy mode, nic locks virtio's
> buffer until there is less or equal than tx_free_threshold buffer remain
> and then free number of tx burst buffer. This causes packets drop in
> virtio side and impacts zero copy performance. So we need to increase
> the tx_free_threshold to let nic free virtio's buffer as soon as possible.
> Also we keep the upper limit to tx max burst size to ensure least
> performance impact on non zero copy.

Ok but why vhost app can't just use tx_queue_setup() to specify desired value 
for
tx_free_thresh?
Why instead we have to modify PMD to satisfy needs of one app?
Konstantin

> 
> Signed-off-by: Chen, Junjie 
> ---
>  drivers/net/i40e/i40e_rxtx.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> index 56a854cec..d9569bdc9 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -2039,6 +2039,8 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
>   tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH);
>   tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ?
>   tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
> + if (tx_free_thresh < nb_desc - I40E_TX_MAX_BURST)
> + tx_free_thresh = nb_desc - I40E_TX_MAX_BURST;
>   if (tx_rs_thresh >= (nb_desc - 2)) {
>   PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the "
>"number of TX descriptors minus 2. "
> --
> 2.16.0



Re: [dpdk-dev] [PATCH v3 4/5] app/testpmd: introduce new tunnel VXLAN-GPE

2018-04-12 Thread Adrien Mazarguil
On Thu, Apr 12, 2018 at 03:33:23PM +0800, Xueming Li wrote:
> Add VXLAN-GPE support to csum forwarding engine and rte flow.
> 
> Signed-off-by: Xueming Li 

This commit still misses testpmd documentation for the new flow command
parameters ("Pattern items" section in
doc/guides/testpmd_app_ug/testpmd_funcs.rst).

Once addressed, as far as rte_flow is concerned (I did not review the csum
engine nor other configuration changes):

Acked-by: Adrien Mazarguil 

> ---
>  app/test-pmd/cmdline_flow.c   | 24 ++
>  app/test-pmd/config.c |  2 +
>  app/test-pmd/csumonly.c   | 83 
> +--
>  app/test-pmd/parameters.c | 12 -
>  app/test-pmd/testpmd.h|  2 +
>  doc/guides/testpmd_app_ug/run_app.rst |  5 +++
>  6 files changed, 124 insertions(+), 4 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
> index f85c1c57f..0d3c62599 100644
> --- a/app/test-pmd/cmdline_flow.c
> +++ b/app/test-pmd/cmdline_flow.c
> @@ -154,6 +154,8 @@ enum index {
>   ITEM_GENEVE,
>   ITEM_GENEVE_VNI,
>   ITEM_GENEVE_PROTO,
> + ITEM_VXLAN_GPE,
> + ITEM_VXLAN_GPE_VNI,
>  
>   /* Validate/create actions. */
>   ACTIONS,
> @@ -470,6 +472,7 @@ static const enum index next_item[] = {
>   ITEM_GTPC,
>   ITEM_GTPU,
>   ITEM_GENEVE,
> + ITEM_VXLAN_GPE,
>   ZERO,
>  };
>  
> @@ -626,6 +629,12 @@ static const enum index item_geneve[] = {
>   ZERO,
>  };
>  
> +static const enum index item_vxlan_gpe[] = {
> + ITEM_VXLAN_GPE_VNI,
> + ITEM_NEXT,
> + ZERO,
> +};
> +
>  static const enum index next_action[] = {
>   ACTION_END,
>   ACTION_VOID,
> @@ -1560,6 +1569,21 @@ static const struct token token_list[] = {
>   .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_geneve,
>protocol)),
>   },
> + [ITEM_VXLAN_GPE] = {
> + .name = "vxlan-gpe",
> + .help = "match VXLAN-GPE header",
> + .priv = PRIV_ITEM(VXLAN_GPE,
> +   sizeof(struct rte_flow_item_vxlan_gpe)),
> + .next = NEXT(item_vxlan_gpe),
> + .call = parse_vc,
> + },
> + [ITEM_VXLAN_GPE_VNI] = {
> + .name = "vni",
> + .help = "VXLAN-GPE identifier",
> + .next = NEXT(item_vxlan_gpe, NEXT_ENTRY(UNSIGNED), item_param),
> + .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_vxlan_gpe,
> +  vni)),
> + },
>  
>   /* Validate/create actions. */
>   [ACTIONS] = {
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 4a273eff7..349eb9015 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -981,6 +981,7 @@ static const struct {
>   MK_FLOW_ITEM(GTPC, sizeof(struct rte_flow_item_gtp)),
>   MK_FLOW_ITEM(GTPU, sizeof(struct rte_flow_item_gtp)),
>   MK_FLOW_ITEM(GENEVE, sizeof(struct rte_flow_item_geneve)),
> + MK_FLOW_ITEM(VXLAN_GPE, sizeof(struct rte_flow_item_vxlan_gpe)),
>  };
>  
>  /** Pattern item specification types. */
> @@ -3082,6 +3083,7 @@ flowtype_to_str(uint16_t flow_type)
>   {"vxlan", RTE_ETH_FLOW_VXLAN},
>   {"geneve", RTE_ETH_FLOW_GENEVE},
>   {"nvgre", RTE_ETH_FLOW_NVGRE},
> + {"vxlan-gpe", RTE_ETH_FLOW_VXLAN_GPE},
>   };
>  
>   for (i = 0; i < RTE_DIM(flowtype_str_table); i++) {
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index 5f5ab64aa..d98c51648 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -60,6 +60,8 @@
>  #define _htons(x) (x)
>  #endif
>  
> +uint16_t vxlan_gpe_udp_port = 4790;
> +
>  /* structure that caches offload info for the current packet */
>  struct testpmd_offload_info {
>   uint16_t ethertype;
> @@ -194,6 +196,70 @@ parse_vxlan(struct udp_hdr *udp_hdr,
>   info->l2_len += ETHER_VXLAN_HLEN; /* add udp + vxlan */
>  }
>  
> +/* Parse a vxlan-gpe header */
> +static void
> +parse_vxlan_gpe(struct udp_hdr *udp_hdr,
> + struct testpmd_offload_info *info)
> +{
> + struct ether_hdr *eth_hdr;
> + struct ipv4_hdr *ipv4_hdr;
> + struct ipv6_hdr *ipv6_hdr;
> + struct vxlan_gpe_hdr *vxlan_gpe_hdr;
> + uint8_t vxlan_gpe_len = sizeof(*vxlan_gpe_hdr);
> +
> + /* Check udp destination port. */
> + if (udp_hdr->dst_port != _htons(vxlan_gpe_udp_port))
> + return;
> +
> + vxlan_gpe_hdr = (struct vxlan_gpe_hdr *)((char *)udp_hdr +
> + sizeof(struct udp_hdr));
> +
> + if (!vxlan_gpe_hdr->proto || vxlan_gpe_hdr->proto ==
> + VXLAN_GPE_TYPE_IPv4) {
> + info->is_tunnel = 1;
> + info->outer_ethertype = info->ethertype;
> + info->outer_l2_len = info->l2_len;
> + info->outer_l3_len = info->l3_len;
> + info->outer_l4_proto = info->l4_pro

Re: [dpdk-dev] [PATCH v3 2/5] ethdev: introduce new tunnel VXLAN-GPE

2018-04-12 Thread Adrien Mazarguil
On Thu, Apr 12, 2018 at 03:33:21PM +0800, Xueming Li wrote:
> VXLAN-GPE enables VXLAN for all protocols. Protocol link:
> https://www.ietf.org/id/draft-ietf-nvo3-vxlan-gpe-05.txt
> 
> Signed-off-by: Xueming Li 

A couple of remaining minor comments, see below. Once addressed:

Acked-by: Adrien Mazarguil 

> ---
>  doc/guides/prog_guide/rte_flow.rst | 12 
>  lib/librte_ether/rte_eth_ctrl.h|  3 ++-
>  lib/librte_ether/rte_flow.c|  1 +
>  lib/librte_ether/rte_flow.h| 27 +++
>  lib/librte_mbuf/rte_mbuf.c |  3 +++
>  lib/librte_mbuf/rte_mbuf.h |  1 +
>  lib/librte_mbuf/rte_mbuf_ptype.c   |  1 +
>  lib/librte_mbuf/rte_mbuf_ptype.h   | 13 +
>  lib/librte_net/rte_ether.h | 25 +
>  9 files changed, 85 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst 
> b/doc/guides/prog_guide/rte_flow.rst
> index 91dbd61a0..9d92d4e1e 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -1044,6 +1044,18 @@ Matches a GENEVE header.
>  - ``rsvd1``: reserved, normally 0x00.
>  - Default ``mask`` matches VNI only.
>  
> +Item: ``VXLAN-GPE``
> +^^^
> +
> +Matches a VXLAN-GPE header (draft-ietf-nvo3-vxlan-gpe-05).
> +
> +- ``flags``: normally 0x0C (I and P flag).

Minor nit:

=> - ``flags``: normally 0x0c (I and P flags).

> +- ``rsvd0``: reserved, normally 0x.
> +- ``protocol``: protocol type.
> +- ``vni``: VXLAN network identifier.
> +- ``rsvd1``: reserved, normally 0x00.
> +- Default ``mask`` matches VNI only.
> +
>  Actions
>  ~~~
>  
> diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
> index 668f59acb..5ea8ae24c 100644
> --- a/lib/librte_ether/rte_eth_ctrl.h
> +++ b/lib/librte_ether/rte_eth_ctrl.h
> @@ -54,7 +54,8 @@ extern "C" {
>  #define RTE_ETH_FLOW_VXLAN  19 /**< VXLAN protocol based flow */
>  #define RTE_ETH_FLOW_GENEVE 20 /**< GENEVE protocol based flow */
>  #define RTE_ETH_FLOW_NVGRE  21 /**< NVGRE protocol based flow */
> -#define RTE_ETH_FLOW_MAX22
> +#define RTE_ETH_FLOW_VXLAN_GPE  22 /**< VXLAN-GPE protocol based 
> flow */
> +#define RTE_ETH_FLOW_MAX23
>  
>  /**
>   * Feature filter types
> diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
> index 3d8116ebd..58ec80f42 100644
> --- a/lib/librte_ether/rte_flow.c
> +++ b/lib/librte_ether/rte_flow.c
> @@ -55,6 +55,7 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] 
> = {
>   MK_FLOW_ITEM(E_TAG, sizeof(struct rte_flow_item_e_tag)),
>   MK_FLOW_ITEM(NVGRE, sizeof(struct rte_flow_item_nvgre)),
>   MK_FLOW_ITEM(GENEVE, sizeof(struct rte_flow_item_geneve)),
> + MK_FLOW_ITEM(VXLAN_GPE, sizeof(struct rte_flow_item_vxlan_gpe)),
>  };
>  
>  /** Generate flow_action[] entry. */
> diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
> index bed727df8..fefd69920 100644
> --- a/lib/librte_ether/rte_flow.h
> +++ b/lib/librte_ether/rte_flow.h
> @@ -335,6 +335,13 @@ enum rte_flow_item_type {
>* See struct rte_flow_item_geneve.
>*/
>   RTE_FLOW_ITEM_TYPE_GENEVE,
> +
> + /**
> +  * Matches a VXLAN-GPE header (draft-ietf-nvo3-vxlan-gpe-05).

Draft reference is unnecessary here, it should be provided further down with
the structure definition as for VXLAN.

> +  *
> +  * See struct rte_flow_item_vxlan_gpe.
> +  */
> + RTE_FLOW_ITEM_TYPE_VXLAN_GPE,
>  };
>  
>  /**
> @@ -864,6 +871,26 @@ static const struct rte_flow_item_geneve 
> rte_flow_item_geneve_mask = {
>  #endif
>  
>  /**
> + * RTE_FLOW_ITEM_TYPE_VXLAN_GPE.
> + *
> + * Matches a VXLAN-GPE header.

Here:

=> Matches a VXLAN-GPE header (draft-ietf-nvo3-vxlan-gpe-05).

> + */
> +struct rte_flow_item_vxlan_gpe {
> + uint8_t flags; /**< Normally 0x0c (I and P flag). */

flag => flags

> + uint8_t rsvd0[2]; /**< Reserved, normally 0x. */
> + uint8_t protocol; /**< Protocol type. */
> + uint8_t vni[3]; /**< VXLAN identifier. */
> + uint8_t rsvd1; /**< Reserved, normally 0x00. */
> +};
> +
> +/** Default mask for RTE_FLOW_ITEM_TYPE_VXLAN_GPE. */
> +#ifndef __cplusplus
> +static const struct rte_flow_item_vxlan_gpe rte_flow_item_vxlan_gpe_mask = {
> + .vni = "\xff\xff\xff",
> +};
> +#endif
> +
> +/**
>   * Matching pattern item definition.
>   *
>   * A pattern is formed by stacking items starting from the lowest protocol
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 091d388d3..dc90379e5 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -405,6 +405,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
>   case PKT_TX_TUNNEL_IPIP: return "PKT_TX_TUNNEL_IPIP";
>   case PKT_TX_TUNNEL_GENEVE: return "PKT_TX_TUNNEL_GENEVE";
>   case PKT_TX_TUNNEL_MPLSINUDP: return "PKT_TX_TUNNEL_MPLSINUDP";
> + case PKT_TX_TU

Re: [dpdk-dev] [PATCHv2] linuxapp eal: set fd to -1 for MAP_ANONYMOUS cases

2018-04-12 Thread Burakov, Anatoly

On 12-Apr-18 12:16 PM, Neil Horman wrote:

https://dpdk.org/tracker/show_bug.cgi?id=18

Indicated that several mmap call sites in the [linux|bsd]app eal code
set fd that was not -1 in their calls while using MAP_ANONYMOUS.  While
probably not a huge deal, the man page does say the fd should be -1 for
portability, as some implementations don't ignore fd as they should for
MAP_ANONYMOUS.

Signed-off-by: Neil Horman 
CC: Thomas Monjalon 
CC: Ferruh Yigit 

---
Change notes

v2) Rebased to HEAD again to adjust for patches that landed ahead of
this
---


Acked-by: Anatoly Burakov 

--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v6 1/2] vhost: add support for interrupt mode

2018-04-12 Thread Tan, Jianfeng



On 4/13/2018 12:28 AM, Junjie Chen wrote:

In some cases we want vhost dequeue work in interrupt mode to
release cpus to others when no data to transmit. So we install
interrupt handler of vhost device and interrupt vectors for each
rx queue when creating new backend according to vhost intrerupt
configuration. Thus, applications could register a epoll event fd
to associate rx queues with interrupt vectors.

Signed-off-by: Junjie Chen 


Reviewed-by: Jianfeng Tan 


---
Changes in v6:
- rebase code to master
Changes in v5:
- update license to DPDK new license format
- rebase code to master
Changes in v4:
- revert back license change
Changes in v3:
- handle failure in the middle of intr setup.
- use vhost API to enable interrupt.
- rebase to check rxq existence.
- update vhost API to support guest notification.
Changes in v2:
- update rx queue index.
- fill efd_counter_size for intr handler.
- update log.
  drivers/net/vhost/rte_eth_vhost.c | 158 +-
  lib/librte_vhost/vhost.c  |  14 ++--
  2 files changed, 163 insertions(+), 9 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index e392d71..8b4b716 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -520,6 +520,136 @@ find_internal_resource(char *ifname)
return list;
  }
  
+static int

+eth_rxq_intr_enable(struct rte_eth_dev *dev, uint16_t qid)
+{
+   struct rte_vhost_vring vring;
+   struct vhost_queue *vq;
+   int ret = 0;
+
+   vq = dev->data->rx_queues[qid];
+   if (!vq) {
+   RTE_LOG(ERR, PMD, "rxq%d is not setup yet\n", qid);
+   return -1;
+   }
+
+   ret = rte_vhost_get_vhost_vring(vq->vid, (qid << 1) + 1, &vring);
+   if (ret < 0) {
+   RTE_LOG(ERR, PMD, "Failed to get rxq%d's vring\n", qid);
+   return ret;
+   }
+   RTE_LOG(INFO, PMD, "Enable interrupt for rxq%d\n", qid);
+   rte_vhost_enable_guest_notification(vq->vid, (qid << 1) + 1, 1);
+   rte_wmb();
+
+   return ret;
+}
+
+static int
+eth_rxq_intr_disable(struct rte_eth_dev *dev, uint16_t qid)
+{
+   struct rte_vhost_vring vring;
+   struct vhost_queue *vq;
+   int ret = 0;
+
+   vq = dev->data->rx_queues[qid];
+   if (!vq) {
+   RTE_LOG(ERR, PMD, "rxq%d is not setup yet\n", qid);
+   return -1;
+   }
+
+   ret = rte_vhost_get_vhost_vring(vq->vid, (qid << 1) + 1, &vring);
+   if (ret < 0) {
+   RTE_LOG(ERR, PMD, "Failed to get rxq%d's vring", qid);
+   return ret;
+   }
+   RTE_LOG(INFO, PMD, "Disable interrupt for rxq%d\n", qid);
+   rte_vhost_enable_guest_notification(vq->vid, (qid << 1) + 1, 0);
+   rte_wmb();
+
+   return 0;
+}
+
+static void
+eth_vhost_uninstall_intr(struct rte_eth_dev *dev)
+{
+   struct rte_intr_handle *intr_handle = dev->intr_handle;
+
+   if (intr_handle) {
+   if (intr_handle->intr_vec)
+   free(intr_handle->intr_vec);
+   free(intr_handle);
+   }
+
+   dev->intr_handle = NULL;
+}
+
+static int
+eth_vhost_install_intr(struct rte_eth_dev *dev)
+{
+   struct rte_vhost_vring vring;
+   struct vhost_queue *vq;
+   int count = 0;
+   int nb_rxq = dev->data->nb_rx_queues;
+   int i;
+   int ret;
+
+   /* uninstall firstly if we are reconnecting */
+   if (dev->intr_handle)
+   eth_vhost_uninstall_intr(dev);
+
+   dev->intr_handle = malloc(sizeof(*dev->intr_handle));
+   if (!dev->intr_handle) {
+   RTE_LOG(ERR, PMD, "Fail to allocate intr_handle\n");
+   return -ENOMEM;
+   }
+   memset(dev->intr_handle, 0, sizeof(*dev->intr_handle));
+
+   dev->intr_handle->efd_counter_size = sizeof(uint64_t);
+
+   dev->intr_handle->intr_vec =
+   malloc(nb_rxq * sizeof(dev->intr_handle->intr_vec[0]));
+
+   if (!dev->intr_handle->intr_vec) {
+   RTE_LOG(ERR, PMD,
+   "Failed to allocate memory for interrupt vector\n");
+   free(dev->intr_handle);
+   return -ENOMEM;
+   }
+
+   RTE_LOG(INFO, PMD, "Prepare intr vec\n");
+   for (i = 0; i < nb_rxq; i++) {
+   vq = dev->data->rx_queues[i];
+   if (!vq) {
+   RTE_LOG(INFO, PMD, "rxq-%d not setup yet, skip!\n", i);
+   continue;
+   }
+
+   ret = rte_vhost_get_vhost_vring(vq->vid, (i << 1) + 1, &vring);
+   if (ret < 0) {
+   RTE_LOG(INFO, PMD,
+   "Failed to get rxq-%d's vring, skip!\n", i);
+   continue;
+   }
+
+   if (vring.kickfd < 0) {
+   RTE_LOG(INFO, PMD,
+   "rxq-%d's kickfd is invalid, skip!\n", i);
+   cont

[dpdk-dev] [PATCH] lib/librte_hash: fix incorrect comment for lookup

2018-04-12 Thread Shreyansh Jain
rte_hash_lookup_with_hash() has wrong comment for its 'sig' param.

Fixes: 1a9f648be291 ("hash: fix for multi-process apps")

Signed-off-by: Shreyansh Jain 
---
 lib/librte_hash/rte_hash.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 3beaca71c..f71ca9fbf 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -328,7 +328,7 @@ rte_hash_lookup(const struct rte_hash *h, const void *key);
  * @param key
  *   Key to find.
  * @param sig
- *   Hash value to remove from the hash table.
+ *   Precomputed hash value for 'key'.
  * @return
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOENT if the key is not found.
-- 
2.14.1



Re: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to improve zero copy performance

2018-04-12 Thread Zhang, Qi Z
Hi Junjie:

> -Original Message-
> From: Ananyev, Konstantin
> Sent: Thursday, April 12, 2018 7:52 PM
> To: Chen, Junjie J ; Xing, Beilei
> ; Zhang, Qi Z 
> Cc: dev@dpdk.org; Chen, Junjie J ; c...@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to
> improve zero copy performance
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Junjie Chen
> > Sent: Thursday, April 12, 2018 6:32 AM
> > To: Xing, Beilei ; Zhang, Qi Z
> > 
> > Cc: dev@dpdk.org; Chen, Junjie J ;
> > c...@dpdk.org
> > Subject: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to
> > improve zero copy performance
> >
> > From: "Chen, Junjie" 
> >
> > When vhost backend works in dequeue zero copy mode, nic locks virtio's
> > buffer until there is less or equal than tx_free_threshold buffer
> > remain and then free number of tx burst buffer. This causes packets
> > drop in virtio side and impacts zero copy performance. So we need to
> > increase the tx_free_threshold to let nic free virtio's buffer as soon as
> possible.
> > Also we keep the upper limit to tx max burst size to ensure least
> > performance impact on non zero copy.
> 
> Ok but why vhost app can't just use tx_queue_setup() to specify desired value
> for tx_free_thresh?
> Why instead we have to modify PMD to satisfy needs of one app?
> Konstantin

I think the commit log could include the explanation that this change is proved 
not impact 
driver's performance and it reduce total memory be locked by PMD Tx, so 
basically it benefit
application that share the same mem pool overall, vhost dequeue zero copy is 
one of the example.

> 
> >
> > Signed-off-by: Chen, Junjie 
> > ---
> >  drivers/net/i40e/i40e_rxtx.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx.c
> > b/drivers/net/i40e/i40e_rxtx.c index 56a854cec..d9569bdc9 100644
> > --- a/drivers/net/i40e/i40e_rxtx.c
> > +++ b/drivers/net/i40e/i40e_rxtx.c
> > @@ -2039,6 +2039,8 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev
> *dev,
> > tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH);
> > tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ?
> > tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
> > +   if (tx_free_thresh < nb_desc - I40E_TX_MAX_BURST)
> > +   tx_free_thresh = nb_desc - I40E_TX_MAX_BURST;

I think we'd better still allow application to set tx_free_thresh, since a 
small tx_free_thresh may still have benefit to let driver handle the first 
strike after device restarted
So, nb_desc - I40E_TX_MAX_BURST can only be set when tx_conf->tx_rs_thresh = 0

Regards
Qi

> > if (tx_rs_thresh >= (nb_desc - 2)) {
> > PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the "
> >  "number of TX descriptors minus 2. "
> > --
> > 2.16.0



Re: [dpdk-dev] [PATCH v6 2/2] net/vhost: update license to SPDX format

2018-04-12 Thread Maxime Coquelin



On 04/12/2018 06:43 PM, Junjie Chen wrote:

Update license to SPDX, also add Intel license.

Signed-off-by: Junjie Chen 
---
  drivers/net/vhost/rte_eth_vhost.c | 34 +++---
  drivers/net/vhost/rte_eth_vhost.h | 35 +++
  2 files changed, 6 insertions(+), 63 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 8b4b716..c6b8637 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -1,34 +1,6 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright (c) 2016 IGEL Co., Ltd.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of IGEL Co.,Ltd. nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 IGEL Co., Ltd.
+ * Copyright(c) 2016-2018 Intel Corporation
   */
  #include 
  #include 
diff --git a/drivers/net/vhost/rte_eth_vhost.h 
b/drivers/net/vhost/rte_eth_vhost.h
index 948f3c8..0e68b9f 100644
--- a/drivers/net/vhost/rte_eth_vhost.h
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -1,36 +1,7 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2016 IGEL Co., Ltd.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of IGEL Co., Ltd. nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 IGEL Co., Ltd.
+ * Copyright(c) 2016-2018 Intel Corporation
   */
-
  #ifndef _RTE_ETH_VHOST_H_
  #define _RTE_ETH_VHOST_H_
  



Acked-by: Maxime Coquelin 

I'll apply it for -rc2.

Thanks,
Maxime


Re: [dpdk-dev] [PATCH v4 00/11] event/octeontx: add event timer adapter driver

2018-04-12 Thread Jerin Jacob
-Original Message-
> Date: Tue, 10 Apr 2018 02:30:24 +0530
> From: Pavan Nikhilesh 
> To: jerin.ja...@caviumnetworks.com, santosh.shu...@caviumnetworks.com,
>  erik.g.carri...@intel.com
> Cc: dev@dpdk.org, Pavan Nikhilesh 
> Subject: [dpdk-dev] [PATCH v4 00/11] event/octeontx: add event timer
>  adapter driver
> X-Mailer: git-send-email 2.17.0
> 
> The event timer adapter[1] provides APIs to configure an event timer device
> that allows an application to arm timers which on expiry push events to an
> event device such as OcteonTx SSO.
> The OcteonTx TIM is a co-processor that can be configured as an event timer
> adapter which can be used by an application to manage event timers.
> 
> The TIM co-processor processes the event timers registered and pushes
> expired event timers to SSO based on the event queue, schedule type, flow
> id etc. provided as rte_event while arming the event timer. It maintains
> event timers with high precision and time granularity of 1us (microsecond).
> 
> [1] http://dpdk.org/dev/patchwork/patch/33525/

Applied to series to dpdk-next-eventdev/master. Thanks.

> 


Re: [dpdk-dev] [PATCHv2] linuxapp eal: set fd to -1 for MAP_ANONYMOUS cases

2018-04-12 Thread Thomas Monjalon
12/04/2018 14:05, Burakov, Anatoly:
> On 12-Apr-18 12:16 PM, Neil Horman wrote:
> > https://dpdk.org/tracker/show_bug.cgi?id=18
> > 
> > Indicated that several mmap call sites in the [linux|bsd]app eal code
> > set fd that was not -1 in their calls while using MAP_ANONYMOUS.  While
> > probably not a huge deal, the man page does say the fd should be -1 for
> > portability, as some implementations don't ignore fd as they should for
> > MAP_ANONYMOUS.
> > 
> > Signed-off-by: Neil Horman 
> > CC: Thomas Monjalon 
> > CC: Ferruh Yigit 
> > 
> > ---
> > Change notes
> > 
> > v2) Rebased to HEAD again to adjust for patches that landed ahead of
> > this
> > ---
> 
> Acked-by: Anatoly Burakov 

Applied, thanks




Re: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to improve zero copy performance

2018-04-12 Thread Bruce Richardson
On Thu, Apr 12, 2018 at 12:20:07PM +, Zhang, Qi Z wrote:
> Hi Junjie:
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Thursday, April 12, 2018 7:52 PM
> > To: Chen, Junjie J ; Xing, Beilei
> > ; Zhang, Qi Z 
> > Cc: dev@dpdk.org; Chen, Junjie J ; c...@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to
> > improve zero copy performance
> > 
> > 
> > 
> > > -Original Message-
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Junjie Chen
> > > Sent: Thursday, April 12, 2018 6:32 AM
> > > To: Xing, Beilei ; Zhang, Qi Z
> > > 
> > > Cc: dev@dpdk.org; Chen, Junjie J ;
> > > c...@dpdk.org
> > > Subject: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to
> > > improve zero copy performance
> > >
> > > From: "Chen, Junjie" 
> > >
> > > When vhost backend works in dequeue zero copy mode, nic locks virtio's
> > > buffer until there is less or equal than tx_free_threshold buffer
> > > remain and then free number of tx burst buffer. This causes packets
> > > drop in virtio side and impacts zero copy performance. So we need to
> > > increase the tx_free_threshold to let nic free virtio's buffer as soon as
> > possible.
> > > Also we keep the upper limit to tx max burst size to ensure least
> > > performance impact on non zero copy.
> > 
> > Ok but why vhost app can't just use tx_queue_setup() to specify desired 
> > value
> > for tx_free_thresh?
> > Why instead we have to modify PMD to satisfy needs of one app?
> > Konstantin
> 
> I think the commit log could include the explanation that this change is 
> proved not impact 
> driver's performance and it reduce total memory be locked by PMD Tx, so 
> basically it benefit
> application that share the same mem pool overall, vhost dequeue zero copy is 
> one of the example.
> 
> > 
> > >
> > > Signed-off-by: Chen, Junjie 
> > > ---
> > >  drivers/net/i40e/i40e_rxtx.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/net/i40e/i40e_rxtx.c
> > > b/drivers/net/i40e/i40e_rxtx.c index 56a854cec..d9569bdc9 100644
> > > --- a/drivers/net/i40e/i40e_rxtx.c
> > > +++ b/drivers/net/i40e/i40e_rxtx.c
> > > @@ -2039,6 +2039,8 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev
> > *dev,
> > >   tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH);
> > >   tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ?
> > >   tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
> > > + if (tx_free_thresh < nb_desc - I40E_TX_MAX_BURST)
> > > + tx_free_thresh = nb_desc - I40E_TX_MAX_BURST;
> 
> I think we'd better still allow application to set tx_free_thresh, since a 
> small tx_free_thresh may still have benefit to let driver handle the first 
> strike after device restarted
> So, nb_desc - I40E_TX_MAX_BURST can only be set when tx_conf->tx_rs_thresh = 0
> 
> Regards
> Qi
> 
+1 for just changing in this case.

/Bruce



[dpdk-dev] [PATCH] eal: fix compilation without VFIO

2018-04-12 Thread Shahaf Shuler
a compilation error occurred when compiling with CONFIG_RTE_EAL_VFIO=n

== Build lib/librte_eal/linuxapp/eal
  CC eal_vfio.o
/download/dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c:1535:1: error: no
previous prototype for 'rte_vfio_dma_map' [-Werror=missing-prototypes]
 rte_vfio_dma_map(uint64_t __rte_unused vaddr, __rte_unused uint64_t
iova,
 ^
/download/dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c:1542:1: error: no
previous prototype for 'rte_vfio_dma_unmap' [-Werror=missing-prototypes]
 rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused
iova,
 ^

As there is no use for those dummy functions without VFIO removing them
completely.

Fixes: 73a639085938 ("vfio: allow to map other memory regions")
Cc: anatoly.bura...@intel.com

Signed-off-by: Shahaf Shuler 
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 16 
 1 file changed, 16 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 589d7d4787..4163bd4e08 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1529,20 +1529,4 @@ rte_vfio_noiommu_is_enabled(void)
return c == 'Y';
 }
 
-#else
-
-int __rte_experimental
-rte_vfio_dma_map(uint64_t __rte_unused vaddr, __rte_unused uint64_t iova,
- __rte_unused uint64_t len)
-{
-   return -1;
-}
-
-int __rte_experimental
-rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused iova,
-   __rte_unused uint64_t len)
-{
-   return -1;
-}
-
 #endif
-- 
2.12.0



Re: [dpdk-dev] [PATCH v2 01/15] net/mlx5: support 16 hardware priorities

2018-04-12 Thread Xueming(Steven) Li


> -Original Message-
> From: Nélio Laranjeiro 
> Sent: Thursday, April 12, 2018 5:09 PM
> To: Xueming(Steven) Li 
> Cc: Shahaf Shuler ; dev@dpdk.org
> Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
> 
> On Tue, Apr 10, 2018 at 03:22:46PM +, Xueming(Steven) Li wrote:
> > Hi Nelio,
> >
> > > -Original Message-
> > > From: Nélio Laranjeiro 
> > > Sent: Tuesday, April 10, 2018 10:42 PM
> > > To: Xueming(Steven) Li 
> > > Cc: Shahaf Shuler ; dev@dpdk.org
> > > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware
> > > priorities
> > >
> > > On Tue, Apr 10, 2018 at 09:34:01PM +0800, Xueming Li wrote:
> > > > Adjust flow priority mapping to adapt new hardware 16 verb flow
> > > > priorites support:
> > > > 0-3: RTE FLOW tunnel rule
> > > > 4-7: RTE FLOW non-tunnel rule
> > > > 8-15: PMD control flow
> > >
> > > This commit log is inducing people in error, this amount of priority
> > > depends on the Mellanox OFED installed, it is not available on
> > > upstream Linux kernel yet nor in the current Mellanox OFED GA.
> > >
> > > What happens when those amount of priority are not available, is it
> > > removing a functionality?  Will it collide with other flows?
> >
> > If 16  priorities not available, simply behavior as 8 priorities.
> 
> It is not described in the commit log, please add it.
> 
> > > > Signed-off-by: Xueming Li 
> 
> > > > },
> > > > [HASH_RXQ_ETH] = {
> > > > .hash_fields = 0,
> > > > .dpdk_rss_hf = 0,
> > > > -   .flow_priority = 3,
> > > > +   .flow_priority = 2,
> > > > },
> > > >  };
> > >
> > > If the amount of priorities remains 8, you are removing the priority
> > > for the tunnel flows introduced by commit 749365717f5c ("net/mlx5:
> > > change tunnel flow priority")
> > >
> > > Please keep this functionality when this patch fails to get the
> > > expected
> > > 16 Verbs priorities.
> >
> > These priority shift are different in 16 priorities scenario, I
> > changed it to calculation. In function mlx5_flow_priorities_detect(),
> > priority shift will be 1 if 8 priorities, 4 in case of 16 priorities.
> > Please refer to changes in function mlx5_flow_update_priority() as well.
> 
> Please light my lamp, I don't see it...

Sorry, please refer to priv->config.flow_priority_shift.

> 
> 
> > > >  static void
> > > > -mlx5_flow_update_priority(struct mlx5_flow_parse *parser,
> > > > +mlx5_flow_update_priority(struct rte_eth_dev *dev,
> > > > + struct mlx5_flow_parse *parser,
> > > >   const struct rte_flow_attr *attr)  {
> > > > +   struct priv *priv = dev->data->dev_private;
> > > > unsigned int i;
> > > > +   uint16_t priority;
> > > >
> > > > +   if (priv->config.flow_priority_shift == 1)
> > > > +   priority = attr->priority * MLX5_VERBS_FLOW_PRIO_4;
> > > > +   else
> > > > +   priority = attr->priority * MLX5_VERBS_FLOW_PRIO_8;
> > > > +   if (!parser->inner)
> > > > +   priority += priv->config.flow_priority_shift;

Here, if non-tunnel flow, lower(increase) 1 for 8 priorities, lower 4 otherwise.
I'll append a comment here.

> > > > if (parser->drop) {
> > > > -   parser->queue[HASH_RXQ_ETH].ibv_attr->priority =
> > > > -   attr->priority +
> > > > -   hash_rxq_init[HASH_RXQ_ETH].flow_priority;
> > > > +   parser->queue[HASH_RXQ_ETH].ibv_attr->priority =
> priority +
> > > > +   
> > > > hash_rxq_init[HASH_RXQ_ETH].flow_priority;
> > > > return;
> > > > }
> > > > for (i = 0; i != hash_rxq_init_n; ++i) {
> > > > -   if (parser->queue[i].ibv_attr) {
> > > > -   parser->queue[i].ibv_attr->priority =
> > > > -   attr->priority +
> > > > -   hash_rxq_init[i].flow_priority -
> > > > -   (parser->inner ? 1 : 0);
> > > > -   }
> > > > +   if (!parser->queue[i].ibv_attr)
> > > > +   continue;
> > > > +   parser->queue[i].ibv_attr->priority = priority +
> > > > +   hash_rxq_init[i].flow_priority;
> 
> Previous code was subtracting one from the table priorities which was
> starting at 1.  In the new code I don't see it.
> 
> What I am missing?

Please refer to new comment above around variable "priority" calculation.

> 
> > > > }
> > > >  }
> > > >
> > > > @@ -1087,7 +1097,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> > > > .layer = HASH_RXQ_ETH,
> > > > .mark_id = MLX5_FLOW_MARK_DEFAULT,
> > > > };
> > > > -   ret = mlx5_flow_convert_attributes(attr, error);
> > > > +   ret = mlx5_flow_convert_attributes(dev, attr, error);
> > > > if (ret)
> > > > retur

Re: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to improve zero copy performance

2018-04-12 Thread Ananyev, Konstantin


> -Original Message-
> From: Richardson, Bruce
> Sent: Thursday, April 12, 2018 2:12 PM
> To: Zhang, Qi Z 
> Cc: Ananyev, Konstantin ; Chen, Junjie J 
> ; Xing, Beilei ;
> dev@dpdk.org; c...@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to improve 
> zero copy performance
> 
> On Thu, Apr 12, 2018 at 12:20:07PM +, Zhang, Qi Z wrote:
> > Hi Junjie:
> >
> > > -Original Message-
> > > From: Ananyev, Konstantin
> > > Sent: Thursday, April 12, 2018 7:52 PM
> > > To: Chen, Junjie J ; Xing, Beilei
> > > ; Zhang, Qi Z 
> > > Cc: dev@dpdk.org; Chen, Junjie J ; c...@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to
> > > improve zero copy performance
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Junjie Chen
> > > > Sent: Thursday, April 12, 2018 6:32 AM
> > > > To: Xing, Beilei ; Zhang, Qi Z
> > > > 
> > > > Cc: dev@dpdk.org; Chen, Junjie J ;
> > > > c...@dpdk.org
> > > > Subject: [dpdk-dev] [PATCH] net/i40e: update tx_free_threshold to
> > > > improve zero copy performance
> > > >
> > > > From: "Chen, Junjie" 
> > > >
> > > > When vhost backend works in dequeue zero copy mode, nic locks virtio's
> > > > buffer until there is less or equal than tx_free_threshold buffer
> > > > remain and then free number of tx burst buffer. This causes packets
> > > > drop in virtio side and impacts zero copy performance. So we need to
> > > > increase the tx_free_threshold to let nic free virtio's buffer as soon 
> > > > as
> > > possible.
> > > > Also we keep the upper limit to tx max burst size to ensure least
> > > > performance impact on non zero copy.
> > >
> > > Ok but why vhost app can't just use tx_queue_setup() to specify desired 
> > > value
> > > for tx_free_thresh?
> > > Why instead we have to modify PMD to satisfy needs of one app?
> > > Konstantin
> >
> > I think the commit log could include the explanation that this change is 
> > proved not impact
> > driver's performance and it reduce total memory be locked by PMD Tx, so 
> > basically it benefit
> > application that share the same mem pool overall, vhost dequeue zero copy 
> > is one of the example.
> >
> > >
> > > >
> > > > Signed-off-by: Chen, Junjie 
> > > > ---
> > > >  drivers/net/i40e/i40e_rxtx.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/drivers/net/i40e/i40e_rxtx.c
> > > > b/drivers/net/i40e/i40e_rxtx.c index 56a854cec..d9569bdc9 100644
> > > > --- a/drivers/net/i40e/i40e_rxtx.c
> > > > +++ b/drivers/net/i40e/i40e_rxtx.c
> > > > @@ -2039,6 +2039,8 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev
> > > *dev,
> > > > tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH);
> > > > tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ?
> > > > tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
> > > > +   if (tx_free_thresh < nb_desc - I40E_TX_MAX_BURST)
> > > > +   tx_free_thresh = nb_desc - I40E_TX_MAX_BURST;
> >
> > I think we'd better still allow application to set tx_free_thresh, since a 
> > small tx_free_thresh may still have benefit to let driver handle the
> first strike after device restarted
> > So, nb_desc - I40E_TX_MAX_BURST can only be set when tx_conf->tx_rs_thresh 
> > = 0
> >
> > Regards
> > Qi
> >
> +1 for just changing in this case.
> 
Basically you suggest to change DEFAULT_TX_FREE_THRESH.
Are you sure that it wouldn't impact any application on any platform (IA, arm, 
etc.)?
As I remember we already had similar conversation few years ago.
Again if memory serves me right - one of the contr-arguments about setting that 
value too high
was that PMD might start to check DD bit inside TXD too often - and will 
collide with HW updating it more often.
As I remember it was suggested to use 1/2 or 3/4 of nb_desc as default one.
Though I still don't see what is wrong with setting tx_free_thresh vi 
queue_setup() for that particular case.
In that case we can be sure that no other stuff will be affected.
After all - that's why it is configurable.
Konstantin



Re: [dpdk-dev] [PATCH v2 01/15] net/mlx5: support 16 hardware priorities

2018-04-12 Thread Nélio Laranjeiro
On Thu, Apr 12, 2018 at 01:43:04PM +, Xueming(Steven) Li wrote:
> 
> 
> > -Original Message-
> > From: Nélio Laranjeiro 
> > Sent: Thursday, April 12, 2018 5:09 PM
> > To: Xueming(Steven) Li 
> > Cc: Shahaf Shuler ; dev@dpdk.org
> > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
> > 
> > On Tue, Apr 10, 2018 at 03:22:46PM +, Xueming(Steven) Li wrote:
> > > Hi Nelio,
> > >
> > > > -Original Message-
> > > > From: Nélio Laranjeiro 
> > > > Sent: Tuesday, April 10, 2018 10:42 PM
> > > > To: Xueming(Steven) Li 
> > > > Cc: Shahaf Shuler ; dev@dpdk.org
> > > > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware
> > > > priorities
> > > >
> > > > On Tue, Apr 10, 2018 at 09:34:01PM +0800, Xueming Li wrote:
> > > > > Adjust flow priority mapping to adapt new hardware 16 verb flow
> > > > > priorites support:
> > > > > 0-3: RTE FLOW tunnel rule
> > > > > 4-7: RTE FLOW non-tunnel rule
> > > > > 8-15: PMD control flow
> > > >
> > > > This commit log is inducing people in error, this amount of priority
> > > > depends on the Mellanox OFED installed, it is not available on
> > > > upstream Linux kernel yet nor in the current Mellanox OFED GA.
> > > >
> > > > What happens when those amount of priority are not available, is it
> > > > removing a functionality?  Will it collide with other flows?
> > >
> > > If 16  priorities not available, simply behavior as 8 priorities.
> > 
> > It is not described in the commit log, please add it.
> > 
> > > > > Signed-off-by: Xueming Li 
> > 
> > > > >   },
> > > > >   [HASH_RXQ_ETH] = {
> > > > >   .hash_fields = 0,
> > > > >   .dpdk_rss_hf = 0,
> > > > > - .flow_priority = 3,
> > > > > + .flow_priority = 2,
> > > > >   },
> > > > >  };
> > > >
> > > > If the amount of priorities remains 8, you are removing the priority
> > > > for the tunnel flows introduced by commit 749365717f5c ("net/mlx5:
> > > > change tunnel flow priority")
> > > >
> > > > Please keep this functionality when this patch fails to get the
> > > > expected
> > > > 16 Verbs priorities.
> > >
> > > These priority shift are different in 16 priorities scenario, I
> > > changed it to calculation. In function mlx5_flow_priorities_detect(),
> > > priority shift will be 1 if 8 priorities, 4 in case of 16 priorities.
> > > Please refer to changes in function mlx5_flow_update_priority() as well.
> > 
> > Please light my lamp, I don't see it...
> 
> Sorry, please refer to priv->config.flow_priority_shift.
> 
> > 
> > 
> > > > >  static void
> > > > > -mlx5_flow_update_priority(struct mlx5_flow_parse *parser,
> > > > > +mlx5_flow_update_priority(struct rte_eth_dev *dev,
> > > > > +   struct mlx5_flow_parse *parser,
> > > > > const struct rte_flow_attr *attr)  {
> > > > > + struct priv *priv = dev->data->dev_private;
> > > > >   unsigned int i;
> > > > > + uint16_t priority;
> > > > >
> > > > > + if (priv->config.flow_priority_shift == 1)
> > > > > + priority = attr->priority * MLX5_VERBS_FLOW_PRIO_4;
> > > > > + else
> > > > > + priority = attr->priority * MLX5_VERBS_FLOW_PRIO_8;
> > > > > + if (!parser->inner)
> > > > > + priority += priv->config.flow_priority_shift;
> 
> Here, if non-tunnel flow, lower(increase) 1 for 8 priorities, lower 4 
> otherwise.
> I'll append a comment here.

Thanks, I totally missed this one.


> > > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c
> > > > > b/drivers/net/mlx5/mlx5_trigger.c index 6bb4ffb14..d80a2e688
> > > > > 100644
> > > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > > @@ -148,12 +148,6 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > >   int ret;
> > > > >
> > > > >   dev->data->dev_started = 1;
> > > > > - ret = mlx5_flow_create_drop_queue(dev);
> > > > > - if (ret) {
> > > > > - DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
> > > > > - dev->data->port_id, strerror(rte_errno));
> > > > > - goto error;
> > > > > - }
> > > > >   DRV_LOG(DEBUG, "port %u allocating and configuring hash Rx
> > queues",
> > > > >   dev->data->port_id);
> > > > >   rte_mempool_walk(mlx5_mp2mr_iter, priv); @@ -202,7 +196,6 @@
> > > > > mlx5_dev_start(struct rte_eth_dev *dev)
> > > > >   mlx5_traffic_disable(dev);
> > > > >   mlx5_txq_stop(dev);
> > > > >   mlx5_rxq_stop(dev);
> > > > > - mlx5_flow_delete_drop_queue(dev);
> > > > >   rte_errno = ret; /* Restore rte_errno. */
> > > > >   return -rte_errno;
> > > > >  }
> > > > > @@ -237,7 +230,6 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
> > > > >   mlx5_rxq_stop(dev);
> > > > >   for (mr = LIST_FIRST(&priv->mr); mr; mr = LIST_FIRST(&priv-
> > >mr))
> > > > >   mlx5_mr_release(mr);
> > > > > - mlx5_flow_delete_drop_queue(dev);
> > > > >  }
> > > 

Re: [dpdk-dev] [PATCH] examples/l3fwd: adding event queue support

2018-04-12 Thread Bruce Richardson
On Thu, Apr 12, 2018 at 06:09:04AM +, Sunil Kumar Kori wrote:
> Gentle reminder to review the RFC.
> 
> Regards
> Sunil Kumar
> 

Hi,

sorry for the delay in review.

/Bruce

> -Original Message-
> From: Sunil Kumar Kori [mailto:sunil.k...@nxp.com] 
> Sent: Monday, March 19, 2018 7:15 PM
> To: dev@dpdk.org
> Cc: Sunil Kumar Kori ; Hemant Agrawal 
> 
> Subject: [PATCH] examples/l3fwd: adding event queue support
> 
> This patch set to add the support for eventdev based queue mode support to 
> the l3fwd application.
> 1. Eventdev support with parallel queue
> 2. Eventdev support with atomic queue
> 
> This patch adds
> - New command line parameter is added named as "dequeue-mode" which
>   identifies dequeue method i.e. dequeue via eventdev or polling
>   (default is polling)
> . If dequeue mode is via:
>  a. eventdev: New parameters are added -e, -a, -l  to cater
>   eventdev config, adapter config and link configuration
> respectively. "--config" option will be invalid in this case.
>  b. poll mode: It will work as of existing way and option for
>   eventdev parameters(-e, -a, -l) will be invalid.
> 
> - Functions are added in l3fwd_em.c and l3fwd_lpm.c for packet I/O
>   operation
> 
> The main purpose of this RFC is get comments on the approach.
> This is a *not tested* code.
> 
> Signed-off-by: Sunil Kumar Kori 
> ---
>  examples/l3fwd/Makefile |   2 +-
>  examples/l3fwd/l3fwd.h  |  21 ++
>  examples/l3fwd/l3fwd_em.c   | 100 
>  examples/l3fwd/l3fwd_eventdev.c | 541 
> 
>  examples/l3fwd/l3fwd_eventdev.h |  85 +++
>  examples/l3fwd/l3fwd_lpm.c  | 100 
>  examples/l3fwd/main.c   | 318 +++
>  examples/l3fwd/meson.build  |   2 +-
>  8 files changed, 1120 insertions(+), 49 deletions(-)  create mode 100644 
> examples/l3fwd/l3fwd_eventdev.c  create mode 100644 
> examples/l3fwd/l3fwd_eventdev.h
> 

My initial impression is that this seems like an awful lot of new code just
to support reading from an eventdev rather than from an ethdev. Looking at
the datapath main function loop, is the only difference there that
rte_eth_rx_burst has been changed to rte_eventdev_dequeue_burst or are
there more significant changes than that?

If this is the case, is this scale of changes really needed to this app?
What about the other examples, how many of them will need to be similarly
updated?

I'm also wondering if it would help, or be useful, to have a vdev type
which wraps an eventdev queue as an ethdev. That would eliminate the need
for the datapath code, and may help abstract away some parts of the setup.
It would also help with re-use if you anticipate wanting to make a similar
change to other apps.

/Bruce



Re: [dpdk-dev] [PATCH v6 1/4] eal/vfio: add multiple container support

2018-04-12 Thread Burakov, Anatoly

On 12-Apr-18 8:19 AM, Xiao Wang wrote:

Currently eal vfio framework binds vfio group fd to the default
container fd during rte_vfio_setup_device, while in some cases,
e.g. vDPA (vhost data path acceleration), we want to put vfio group
to a separate container and program IOMMU via this container.

This patch adds some APIs to support container creating and device
binding with a container.

A driver could use "rte_vfio_create_container" helper to create a
new container from eal, use "rte_vfio_bind_group" to bind a device
to the newly created container.

During rte_vfio_setup_device, the container bound with the device
will be used for IOMMU setup.

Signed-off-by: Junjie Chen 
Signed-off-by: Xiao Wang 
Reviewed-by: Maxime Coquelin 
Reviewed-by: Ferruh Yigit 
---


Apologies for late review. Some comments below.

<...>

  
+struct rte_memseg;

+
  /**
   * Setup vfio_cfg for the device identified by its address.
   * It discovers the configured I/O MMU groups or sets a new one for the 
device.
@@ -131,6 +133,117 @@ rte_vfio_clear_group(int vfio_group_fd);
  }
  #endif
  


<...>


+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Perform dma mapping for devices in a conainer.
+ *
+ * @param container_fd
+ *   the specified container fd
+ *
+ * @param dma_type
+ *   the dma map type
+ *
+ * @param ms
+ *   the dma address region to map
+ *
+ * @return
+ *0 if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_dma_map(int container_fd, int dma_type, const struct rte_memseg *ms);
+


First of all, why memseg, instead of va/iova/len? This seems like 
unnecessary attachment to internals of DPDK memory representation. Not 
all memory comes in memsegs, this makes the API unnecessarily specific 
to DPDK memory.


Also, why providing DMA type? There's already a VFIO type pointer in 
vfio_config - you can set this pointer for every new created container, 
so the user wouldn't have to care about IOMMU type. Is it not possible 
to figure out DMA type from within EAL VFIO? If not, maybe provide an 
API to do so, e.g. rte_vfio_container_set_dma_type()?


This will also need to be rebased on top of latest HEAD because there 
already is a similar DMA map/unmap API added, only without the container 
parameter. Perhaps rename these new functions to 
rte_vfio_container_(create|destroy|dma_map|dma_unmap)?



+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Perform dma unmapping for devices in a conainer.
+ *
+ * @param container_fd
+ *   the specified container fd
+ *
+ * @param dma_type
+ *the dma map type
+ *
+ * @param ms
+ *   the dma address region to unmap
+ *
+ * @return
+ *0 if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_dma_unmap(int container_fd, int dma_type, const struct rte_memseg 
*ms);
+
  #endif /* VFIO_PRESENT */
  


<...>


@@ -75,8 +53,8 @@ vfio_get_group_fd(int iommu_group_no)
if (vfio_group_fd < 0) {
/* if file not found, it's not an error */
if (errno != ENOENT) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", 
filename,
-   strerror(errno));
+   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+   filename, strerror(errno));


This looks like unintended change.


return -1;
}
  
@@ -86,8 +64,10 @@ vfio_get_group_fd(int iommu_group_no)

vfio_group_fd = open(filename, O_RDWR);
if (vfio_group_fd < 0) {
if (errno != ENOENT) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: 
%s\n", filename,
-   strerror(errno));
+   RTE_LOG(ERR, EAL,
+   "Cannot open %s: %s\n",
+   filename,
+   strerror(errno));


This looks like unintended change.


return -1;
}
return 0;
@@ -95,21 +75,19 @@ vfio_get_group_fd(int iommu_group_no)
/* noiommu group found */
}
  
-		cur_grp->group_no = iommu_group_no;

-   cur_grp->fd = vfio_group_fd;
-   vfio_cfg.vfio_active_groups++;
return vfio_group_fd;
}
-   /* if we're in a secondary process, request group fd from the primary
+   /*
+* if we're in a secondary process, request group fd from the primary
 * process via our socket
 */


This looks like unintended change.


else {
-   int socket_fd, ret;
-
-   socket

Re: [dpdk-dev] [PATCH] eal: fix compilation without VFIO

2018-04-12 Thread Burakov, Anatoly

On 12-Apr-18 2:34 PM, Shahaf Shuler wrote:

a compilation error occurred when compiling with CONFIG_RTE_EAL_VFIO=n

== Build lib/librte_eal/linuxapp/eal
   CC eal_vfio.o
/download/dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c:1535:1: error: no
previous prototype for 'rte_vfio_dma_map' [-Werror=missing-prototypes]
  rte_vfio_dma_map(uint64_t __rte_unused vaddr, __rte_unused uint64_t
iova,
  ^
/download/dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c:1542:1: error: no
previous prototype for 'rte_vfio_dma_unmap' [-Werror=missing-prototypes]
  rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused
iova,
  ^

As there is no use for those dummy functions without VFIO removing them
completely.

Fixes: 73a639085938 ("vfio: allow to map other memory regions")
Cc: anatoly.bura...@intel.com

Signed-off-by: Shahaf Shuler 
---
  lib/librte_eal/linuxapp/eal/eal_vfio.c | 16 
  1 file changed, 16 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 589d7d4787..4163bd4e08 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1529,20 +1529,4 @@ rte_vfio_noiommu_is_enabled(void)
return c == 'Y';
  }
  
-#else

-
-int __rte_experimental
-rte_vfio_dma_map(uint64_t __rte_unused vaddr, __rte_unused uint64_t iova,
- __rte_unused uint64_t len)
-{
-   return -1;
-}
-
-int __rte_experimental
-rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused iova,
-   __rte_unused uint64_t len)
-{
-   return -1;
-}
-
  #endif



These functions are part of public API, like rest of functions in this 
header. They're in the map file. Should we perhaps go the BSD way and 
provide EAL with dummy prototypes as well? See bsdapp/eal/eal.c:763 onwards.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH] examples/l3fwd: adding event queue support

2018-04-12 Thread Jerin Jacob
-Original Message-
> Date: Thu, 12 Apr 2018 15:03:22 +0100
> From: Bruce Richardson 
> To: Sunil Kumar Kori 
> CC: "dev@dpdk.org" , Hemant Agrawal 
> Subject: Re: [dpdk-dev] [PATCH] examples/l3fwd: adding event queue support
> User-Agent: Mutt/1.9.4 (2018-02-28)
> 
> On Thu, Apr 12, 2018 at 06:09:04AM +, Sunil Kumar Kori wrote:
> > Gentle reminder to review the RFC.
> > 
> > Regards
> > Sunil Kumar
> > 
> 
> Hi,
> 
> sorry for the delay in review.
> 
> /Bruce
> 
> > -Original Message-
> > From: Sunil Kumar Kori [mailto:sunil.k...@nxp.com] 
> > Sent: Monday, March 19, 2018 7:15 PM
> > To: dev@dpdk.org
> > Cc: Sunil Kumar Kori ; Hemant Agrawal 
> > 
> > Subject: [PATCH] examples/l3fwd: adding event queue support
> > 
> > This patch set to add the support for eventdev based queue mode support to 
> > the l3fwd application.
> > 1. Eventdev support with parallel queue
> > 2. Eventdev support with atomic queue
> > 
> > This patch adds
> > - New command line parameter is added named as "dequeue-mode" which
> >   identifies dequeue method i.e. dequeue via eventdev or polling
> >   (default is polling)
> > . If dequeue mode is via:
> >  a. eventdev: New parameters are added -e, -a, -l  to cater
> > eventdev config, adapter config and link configuration
> > respectively. "--config" option will be invalid in this case.
> >  b. poll mode: It will work as of existing way and option for
> > eventdev parameters(-e, -a, -l) will be invalid.
> > 
> > - Functions are added in l3fwd_em.c and l3fwd_lpm.c for packet I/O
> >   operation
> > 
> > The main purpose of this RFC is get comments on the approach.
> > This is a *not tested* code.
> > 
> > Signed-off-by: Sunil Kumar Kori 
> > ---
> >  examples/l3fwd/Makefile |   2 +-
> >  examples/l3fwd/l3fwd.h  |  21 ++
> >  examples/l3fwd/l3fwd_em.c   | 100 
> >  examples/l3fwd/l3fwd_eventdev.c | 541 
> > 
> >  examples/l3fwd/l3fwd_eventdev.h |  85 +++
> >  examples/l3fwd/l3fwd_lpm.c  | 100 
> >  examples/l3fwd/main.c   | 318 +++
> >  examples/l3fwd/meson.build  |   2 +-
> >  8 files changed, 1120 insertions(+), 49 deletions(-)  create mode 100644 
> > examples/l3fwd/l3fwd_eventdev.c  create mode 100644 
> > examples/l3fwd/l3fwd_eventdev.h
> > 
> 
> My initial impression is that this seems like an awful lot of new code just
> to support reading from an eventdev rather than from an ethdev. Looking at
> the datapath main function loop, is the only difference there that
> rte_eth_rx_burst has been changed to rte_eventdev_dequeue_burst or are
> there more significant changes than that?
> 
> If this is the case, is this scale of changes really needed to this app?
> What about the other examples, how many of them will need to be similarly
> updated?
> 
> I'm also wondering if it would help, or be useful, to have a vdev type
> which wraps an eventdev queue as an ethdev. That would eliminate the need
> for the datapath code, and may help abstract away some parts of the setup.
> It would also help with re-use if you anticipate wanting to make a similar
> change to other apps.

Exposing as an ethdev-vdev device will introduce cyclic build dependency(now
eventdev is depended on ethdev). I think, maybe a helper function
in eventdev area to setup Rx adapter and similar slow path logic may work.


> 
> /Bruce
> 


Re: [dpdk-dev] [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification

2018-04-12 Thread Xueming(Steven) Li


> -Original Message-
> From: Nélio Laranjeiro 
> Sent: Thursday, April 12, 2018 5:51 PM
> To: Xueming(Steven) Li 
> Cc: Shahaf Shuler ; dev@dpdk.org
> Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> identification
> 
> On Wed, Apr 11, 2018 at 08:11:50AM +, Xueming(Steven) Li wrote:
> > Hi Nelio,
> >
> > > -Original Message-
> > > From: Nélio Laranjeiro 
> > > Sent: Tuesday, April 10, 2018 11:17 PM
> > > To: Xueming(Steven) Li 
> > > Cc: Shahaf Shuler ; dev@dpdk.org
> > > Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> > > identification
> > >
> > > On Tue, Apr 10, 2018 at 09:34:04PM +0800, Xueming Li wrote:
> > > > This patch introduced tunnel type identification based on flow rules.
> > > > If flows of multiple tunnel types built on same queue,
> > > > RTE_PTYPE_TUNNEL_MASK will be returned, bits in flow mark could be
> > > > used as tunnel type identifier.
> > >
> > > I don't see anywhere in this patch where the bits are reserved to
> > > identify a flow, nor values which can help to identify it.
> > >
> > > Is this missing?
> > >
> > > Anyway we have already very few bits in the mark making it difficult
> > > to be used by the user, reserving again some to may lead to remove
> > > the mark support from the flows.
> >
> > Not all users will use multiple tunnel types, this is not included in
> > this patch set and left to user decision. I'll update comments to make
> this clear.
> 
> Thanks,
> 
> > > > Signed-off-by: Xueming Li 
> 
> > > >  /**
> > > > + * RXQ update after flow rule creation.
> > > > + *
> > > > + * @param dev
> > > > + *   Pointer to Ethernet device.
> > > > + * @param flow
> > > > + *   Pointer to the flow rule.
> > > > + */
> > > > +static void
> > > > +mlx5_flow_create_update_rxqs(struct rte_eth_dev *dev, struct
> > > > +rte_flow
> > > > +*flow) {
> > > > +   struct priv *priv = dev->data->dev_private;
> > > > +   unsigned int i;
> > > > +
> > > > +   if (!dev->data->dev_started)
> > > > +   return;
> > > > +   for (i = 0; i != flow->rss_conf.queue_num; ++i) {
> > > > +   struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
> > > > +[(*flow->queues)[i]];
> > > > +   struct mlx5_rxq_ctrl *rxq_ctrl =
> > > > +   container_of(rxq_data, struct mlx5_rxq_ctrl, 
> > > > rxq);
> > > > +   uint8_t tunnel = PTYPE_IDX(flow->tunnel);
> > > > +
> > > > +   rxq_data->mark |= flow->mark;
> > > > +   if (!tunnel)
> > > > +   continue;
> > > > +   rxq_ctrl->tunnel_types[tunnel] += 1;
> > >
> > > I don't understand why you need such array, the NIC is unable to
> > > return the tunnel type has it returns only one bit saying tunnel.
> > > Why don't it store in the priv structure the current configured tunnel?
> >
> > This array is used to count tunnel types bound to queue, if only one
> > tunnel type, ptype will report that tunnel type, TUNNEL MASK(max
> > value) will be returned if multiple types bound to a queue.
> >
> > Flow rss action specifies queues that binding to tunnel, thus we can't
> > assume all queues have same tunnel types, so this is a per queue
> structure.
> 
> There is something I am missing here, how in the dataplane the PMD can
> understand from 1 bit which kind of tunnel the packet is matching?

The code under this line is answer, let me post here: 
if (rxq_data->tunnel != flow->tunnel)
rxq_data->tunnel = rxq_data->tunnel ?
   RTE_PTYPE_TUNNEL_MASK :
   flow->tunnel;
If no tunnel type associated to rxq, use tunnel type from flow.
If a new tunnel type from flow, use RTE_PTYPE_TUNNEL_MASK.

> 
> 
> > > > @@ -2334,9 +2414,9 @@ mlx5_flow_stop(struct rte_eth_dev *dev,
> > > > struct mlx5_flows *list)  {
> > > > struct priv *priv = dev->data->dev_private;
> > > > struct rte_flow *flow;
> > > > +   unsigned int i;
> > > >
> > > > TAILQ_FOREACH_REVERSE(flow, list, mlx5_flows, next) {
> > > > -   unsigned int i;
> > > > struct mlx5_ind_table_ibv *ind_tbl = NULL;
> > > >
> > > > if (flow->drop) {
> > > > @@ -2382,6 +2462,16 @@ mlx5_flow_stop(struct rte_eth_dev *dev,
> > > > struct
> > > mlx5_flows *list)
> > > > DRV_LOG(DEBUG, "port %u flow %p removed", dev->data-
> >port_id,
> > > > (void *)flow);
> > > > }
> > > > +   /* Cleanup Rx queue tunnel info. */
> > > > +   for (i = 0; i != priv->rxqs_n; ++i) {
> > > > +   struct mlx5_rxq_data *q = (*priv->rxqs)[i];
> > > > +   struct mlx5_rxq_ctrl *rxq_ctrl =
> > > > +   container_of(q, struct mlx5_rxq_ctrl, rxq);
> > > > +
> > > > +   memset((void *)rxq_ctrl->tunnel_types, 0,
> > > > +  sizeof(rxq

Re: [dpdk-dev] [PATCH] lib/librte_hash: fix incorrect comment for lookup

2018-04-12 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Shreyansh Jain [mailto:shreyansh.j...@nxp.com]
> Sent: Thursday, April 12, 2018 1:34 PM
> To: Richardson, Bruce ; De Lara Guarch, Pablo
> 
> Cc: dev@dpdk.org; Shreyansh Jain 
> Subject: [PATCH] lib/librte_hash: fix incorrect comment for lookup
> 
> rte_hash_lookup_with_hash() has wrong comment for its 'sig' param.
> 
> Fixes: 1a9f648be291 ("hash: fix for multi-process apps")
> 
> Signed-off-by: Shreyansh Jain 

Acked-by: Pablo de Lara 

Also, this should be backported to the stable branch, so I CC sta...@dpdk.org.


[dpdk-dev] [PATCH 0/2] net/mlx5: fix flow director mask

2018-04-12 Thread Nelio Laranjeiro
Flow director mask as been mistakenly removed from mlx5 PMD.  This series
brings it back.

Nelio Laranjeiro (2):
  net/mlx5: split L3/L4 in flow director
  net/mlx5: fix flow director mask

 drivers/net/mlx5/mlx5_flow.c | 155 ---
 1 file changed, 69 insertions(+), 86 deletions(-)

-- 
2.17.0



[dpdk-dev] [PATCH 1/2] net/mlx5: split L3/L4 in flow director

2018-04-12 Thread Nelio Laranjeiro
This will help to bring back the mask handler which was removed when this
feature was rewritten on top of rte_flow.

Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_flow.c | 112 ---
 1 file changed, 37 insertions(+), 75 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 7ef68de49..7ba643b83 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2695,8 +2695,11 @@ mlx5_fdir_filter_convert(struct rte_eth_dev *dev,
return -rte_errno;
}
attributes->queue.index = fdir_filter->action.rx_queue;
+   /* Handle L3. */
switch (fdir_filter->input.flow_type) {
case RTE_ETH_FLOW_NONFRAG_IPV4_UDP:
+   case RTE_ETH_FLOW_NONFRAG_IPV4_TCP:
+   case RTE_ETH_FLOW_NONFRAG_IPV4_OTHER:
attributes->l3.ipv4.hdr = (struct ipv4_hdr){
.src_addr = input->flow.udp4_flow.ip.src_ip,
.dst_addr = input->flow.udp4_flow.ip.dst_ip,
@@ -2704,15 +2707,44 @@ mlx5_fdir_filter_convert(struct rte_eth_dev *dev,
.type_of_service = input->flow.udp4_flow.ip.tos,
.next_proto_id = input->flow.udp4_flow.ip.proto,
};
-   attributes->l4.udp.hdr = (struct udp_hdr){
-   .src_port = input->flow.udp4_flow.src_port,
-   .dst_port = input->flow.udp4_flow.dst_port,
-   };
attributes->items[1] = (struct rte_flow_item){
.type = RTE_FLOW_ITEM_TYPE_IPV4,
.spec = &attributes->l3,
.mask = &attributes->l3,
};
+   break;
+   case RTE_ETH_FLOW_NONFRAG_IPV6_UDP:
+   case RTE_ETH_FLOW_NONFRAG_IPV6_TCP:
+   case RTE_ETH_FLOW_NONFRAG_IPV6_OTHER:
+   attributes->l3.ipv6.hdr = (struct ipv6_hdr){
+   .hop_limits = input->flow.udp6_flow.ip.hop_limits,
+   .proto = input->flow.udp6_flow.ip.proto,
+   };
+   memcpy(attributes->l3.ipv6.hdr.src_addr,
+  input->flow.udp6_flow.ip.src_ip,
+  RTE_DIM(attributes->l3.ipv6.hdr.src_addr));
+   memcpy(attributes->l3.ipv6.hdr.dst_addr,
+  input->flow.udp6_flow.ip.dst_ip,
+  RTE_DIM(attributes->l3.ipv6.hdr.src_addr));
+   attributes->items[1] = (struct rte_flow_item){
+   .type = RTE_FLOW_ITEM_TYPE_IPV6,
+   .spec = &attributes->l3,
+   .mask = &attributes->l3,
+   };
+   break;
+   default:
+   DRV_LOG(ERR, "port %u invalid flow type%d",
+   dev->data->port_id, fdir_filter->input.flow_type);
+   rte_errno = ENOTSUP;
+   return -rte_errno;
+   }
+   /* Handle L4. */
+   switch (fdir_filter->input.flow_type) {
+   case RTE_ETH_FLOW_NONFRAG_IPV4_UDP:
+   attributes->l4.udp.hdr = (struct udp_hdr){
+   .src_port = input->flow.udp4_flow.src_port,
+   .dst_port = input->flow.udp4_flow.dst_port,
+   };
attributes->items[2] = (struct rte_flow_item){
.type = RTE_FLOW_ITEM_TYPE_UDP,
.spec = &attributes->l4,
@@ -2720,62 +2752,21 @@ mlx5_fdir_filter_convert(struct rte_eth_dev *dev,
};
break;
case RTE_ETH_FLOW_NONFRAG_IPV4_TCP:
-   attributes->l3.ipv4.hdr = (struct ipv4_hdr){
-   .src_addr = input->flow.tcp4_flow.ip.src_ip,
-   .dst_addr = input->flow.tcp4_flow.ip.dst_ip,
-   .time_to_live = input->flow.tcp4_flow.ip.ttl,
-   .type_of_service = input->flow.tcp4_flow.ip.tos,
-   .next_proto_id = input->flow.tcp4_flow.ip.proto,
-   };
attributes->l4.tcp.hdr = (struct tcp_hdr){
.src_port = input->flow.tcp4_flow.src_port,
.dst_port = input->flow.tcp4_flow.dst_port,
};
-   attributes->items[1] = (struct rte_flow_item){
-   .type = RTE_FLOW_ITEM_TYPE_IPV4,
-   .spec = &attributes->l3,
-   .mask = &attributes->l3,
-   };
attributes->items[2] = (struct rte_flow_item){
.type = RTE_FLOW_ITEM_TYPE_TCP,
.spec = &attributes->l4,
.mask = &attributes->l4,
};
break;
-   case RTE_ETH_FLOW_NONFRAG_IPV4_OTHER:
-   attributes->l3.ipv4.hdr = (struct ipv4_hdr){
-   .src_addr = input->flow.ip4_flow.src_ip,
-   .dst_add

[dpdk-dev] [PATCH 2/2] net/mlx5: fix flow director mask

2018-04-12 Thread Nelio Laranjeiro
During the transition to resurrect flow director on top of rte_flow, mask
handling was removed by mistake.

Fixes: 4c3e9bcdd52e ("net/mlx5: support flow director")
Cc: sta...@dpdk.org

Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_flow.c | 59 
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 7ba643b83..5e75afa7f 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2661,6 +2661,9 @@ mlx5_fdir_filter_convert(struct rte_eth_dev *dev,
 {
struct priv *priv = dev->data->dev_private;
const struct rte_eth_fdir_input *input = &fdir_filter->input;
+   const struct rte_eth_fdir_masks *mask =
+   &dev->data->dev_conf.fdir_conf.mask;
+   unsigned int i;
 
/* Validate queue number. */
if (fdir_filter->action.rx_queue >= priv->rxqs_n) {
@@ -2701,11 +2704,16 @@ mlx5_fdir_filter_convert(struct rte_eth_dev *dev,
case RTE_ETH_FLOW_NONFRAG_IPV4_TCP:
case RTE_ETH_FLOW_NONFRAG_IPV4_OTHER:
attributes->l3.ipv4.hdr = (struct ipv4_hdr){
-   .src_addr = input->flow.udp4_flow.ip.src_ip,
-   .dst_addr = input->flow.udp4_flow.ip.dst_ip,
-   .time_to_live = input->flow.udp4_flow.ip.ttl,
-   .type_of_service = input->flow.udp4_flow.ip.tos,
-   .next_proto_id = input->flow.udp4_flow.ip.proto,
+   .src_addr = input->flow.udp4_flow.ip.src_ip &
+   mask->ipv4_mask.src_ip,
+   .dst_addr = input->flow.udp4_flow.ip.dst_ip &
+   mask->ipv4_mask.dst_ip,
+   .time_to_live = input->flow.udp4_flow.ip.ttl &
+   mask->ipv4_mask.ttl,
+   .type_of_service = input->flow.udp4_flow.ip.tos &
+   mask->ipv4_mask.ttl,
+   .next_proto_id = input->flow.udp4_flow.ip.proto &
+   mask->ipv4_mask.proto,
};
attributes->items[1] = (struct rte_flow_item){
.type = RTE_FLOW_ITEM_TYPE_IPV4,
@@ -2720,12 +2728,17 @@ mlx5_fdir_filter_convert(struct rte_eth_dev *dev,
.hop_limits = input->flow.udp6_flow.ip.hop_limits,
.proto = input->flow.udp6_flow.ip.proto,
};
-   memcpy(attributes->l3.ipv6.hdr.src_addr,
-  input->flow.udp6_flow.ip.src_ip,
-  RTE_DIM(attributes->l3.ipv6.hdr.src_addr));
-   memcpy(attributes->l3.ipv6.hdr.dst_addr,
-  input->flow.udp6_flow.ip.dst_ip,
-  RTE_DIM(attributes->l3.ipv6.hdr.src_addr));
+
+   for (i = 0;
+i != RTE_DIM(attributes->l3.ipv6.hdr.src_addr);
+++i) {
+   attributes->l3.ipv6.hdr.src_addr[i] =
+   input->flow.udp6_flow.ip.src_ip[i] &
+   mask->ipv6_mask.src_ip[i];
+   attributes->l3.ipv6.hdr.dst_addr[i] =
+   input->flow.udp6_flow.ip.dst_ip[i] &
+   mask->ipv6_mask.dst_ip[i];
+   }
attributes->items[1] = (struct rte_flow_item){
.type = RTE_FLOW_ITEM_TYPE_IPV6,
.spec = &attributes->l3,
@@ -2742,8 +2755,10 @@ mlx5_fdir_filter_convert(struct rte_eth_dev *dev,
switch (fdir_filter->input.flow_type) {
case RTE_ETH_FLOW_NONFRAG_IPV4_UDP:
attributes->l4.udp.hdr = (struct udp_hdr){
-   .src_port = input->flow.udp4_flow.src_port,
-   .dst_port = input->flow.udp4_flow.dst_port,
+   .src_port = input->flow.udp4_flow.src_port &
+   mask->src_port_mask,
+   .dst_port = input->flow.udp4_flow.dst_port &
+   mask->dst_port_mask,
};
attributes->items[2] = (struct rte_flow_item){
.type = RTE_FLOW_ITEM_TYPE_UDP,
@@ -2753,8 +2768,10 @@ mlx5_fdir_filter_convert(struct rte_eth_dev *dev,
break;
case RTE_ETH_FLOW_NONFRAG_IPV4_TCP:
attributes->l4.tcp.hdr = (struct tcp_hdr){
-   .src_port = input->flow.tcp4_flow.src_port,
-   .dst_port = input->flow.tcp4_flow.dst_port,
+   .src_port = input->flow.tcp4_flow.src_port &
+   mask->src_port_mask,
+   .dst_port = input->flow.tcp4_flow.dst_port &
+   mask->dst_port_mask,
};
attributes->items[2] = (struct rte

[dpdk-dev] [PATCH 0/2] testpmd: simulating noisy host environment

2018-04-12 Thread Jens Freimann

This patch set proposes enhancements to testpmd to simulate
more realistic behavior of a guest machine engaged in receiving
and sending packets performing Virtual Network Function (VNF).

The goal is to enable simple of measuring performance impact on cache and
memory footprint utilization from various VNF co-located on the
same host machine.

This series of patches adds the new command line switches to
testpmd:

--buffersize-before-sending [packet numbers]

Keep the mbuf in a FIFO and forward the over flooding packets from the
FIFO. This queue is per TX-queue (after all other packet processing).

--flush-timer [delay]
Flush the packet queue if no packets have been seen during
[delay]. As long as packets are seen, the timer is reset.


Options to simulate route lookups:

--memory-footprint [size]
Size of the VNF internal memory (MB), in which the random
read/write will be done, allocated by rte_malloc (hugepages).

--random-w-memory-access-per-packet [num]
Number of random writes in memory per packet should be
performed, simulating hit-flags update. 64 bits per write,
all write in different cache lines.

--random-r-memory-access-per-packet [num]
Number of random reads in memory per packet should be
performed, simulating FIB/table lookups. 64 bits per read,
all write in different cache lines.

--random-rw-memory-access-per-packet [num]
Number of random reads and writes in memory per packet should
be performed, simulating stats update. 64 bits per read-write, all
reads and writes in different cache lines.

Comments are appreciated. 

Should parameter names be prefixed so they
won't be confused with other functionality? 

regards,
Jens 


Jens Freimann (2):
  testpmd: add parameters buffersize-before-send and flush-timeout
  testpmd: add code to simulate noisy neighbour memory usage

 app/test-pmd/fifo.h   |  43 ++
 app/test-pmd/iofwd.c  | 109 +-
 app/test-pmd/parameters.c |  57 +++-
 app/test-pmd/testpmd.c|  78 +
 app/test-pmd/testpmd.h|  20 +
 config/common_base|   1 +
 6 files changed, 306 insertions(+), 2 deletions(-)
 create mode 100644 app/test-pmd/fifo.h

-- 
2.14.3



[dpdk-dev] [PATCH 1/2] testpmd: add parameters buffersize-before-send and flush-timeout

2018-04-12 Thread Jens Freimann
Create a fifo to buffer received packets. Once it flows over put
those packets into the actual tx queue. The fifo is created per tx
queue and its size can be set with the --buffersize-before-sending
commandline parameter.

A second commandline parameter is used to set a timeout in
milliseconds after which the fifo is flushed.

--buffersize-before-sending [packet numbers]
Keep the mbuf in a FIFO and forward the over flooding packets from the
FIFO. This queue is per TX-queue (after all other packet processing).

--flush-timer [delay]
Flush the packet queue if no packets have been seen during
[delay]. As long as packets are seen, the timer is reset.

Signed-off-by: Jens Freimann 
---
 app/test-pmd/fifo.h   | 43 ++
 app/test-pmd/iofwd.c  | 59 ++-
 app/test-pmd/parameters.c | 21 -
 app/test-pmd/testpmd.c| 48 ++
 app/test-pmd/testpmd.h| 15 
 config/common_base|  1 +
 6 files changed, 185 insertions(+), 2 deletions(-)
 create mode 100644 app/test-pmd/fifo.h

diff --git a/app/test-pmd/fifo.h b/app/test-pmd/fifo.h
new file mode 100644
index 0..01415f98c
--- /dev/null
+++ b/app/test-pmd/fifo.h
@@ -0,0 +1,43 @@
+#ifndef __FIFO_H
+#define __FIFO_H
+#include 
+#include 
+#include 
+#include "testpmd.h"
+
+#define FIFO_COUNT_MAX 1024
+/**
+ * Add elements to fifo. Return number of written elements
+ */
+static inline unsigned
+fifo_put(struct rte_ring *r, struct rte_mbuf **data, unsigned num)
+{
+
+   return rte_ring_enqueue_burst(r, (void **)data, num, NULL);
+}
+
+/**
+ * Get elements from fifo. Return number of read elements
+ */
+static inline unsigned
+fifo_get(struct rte_ring *r, struct rte_mbuf **data, unsigned num)
+{
+   return rte_ring_dequeue_burst(r, (void **) data, num, NULL);
+}
+
+static inline unsigned
+fifo_count(struct rte_ring *r)
+{
+   return rte_ring_count(r);
+}
+
+static inline int
+fifo_full(struct rte_ring *r)
+{
+   return rte_ring_full(r);
+}
+
+
+struct rte_ring * fifo_init(uint32_t qi, uint32_t pi);
+
+#endif 
diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 9dce76efe..85fa000f7 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -36,6 +36,7 @@
 #include 
 
 #include "testpmd.h"
+#include "fifo.h"
 
 /*
  * Forwarding of packets in I/O mode.
@@ -48,7 +49,7 @@ pkt_burst_io_forward(struct fwd_stream *fs)
 {
struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
uint16_t nb_rx;
-   uint16_t nb_tx;
+   uint16_t nb_tx = 0;
uint32_t retry;
 
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
@@ -56,6 +57,15 @@ pkt_burst_io_forward(struct fwd_stream *fs)
uint64_t end_tsc;
uint64_t core_cycles;
 #endif
+#ifdef RTE_TEST_PMD_NOISY
+   const uint64_t freq_khz = rte_get_timer_hz() / 1000;
+   struct noisy_config *ncf = &noisy_cfg[fs->tx_queue];
+   struct rte_mbuf *tmp_pkts[MAX_PKT_BURST];
+   uint16_t nb_enqd;
+   uint16_t nb_deqd = 0;
+   uint64_t delta_ms;
+   uint64_t now;
+#endif
 
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
start_tsc = rte_rdtsc();
@@ -73,8 +83,55 @@ pkt_burst_io_forward(struct fwd_stream *fs)
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
 #endif
+#ifdef RTE_TEST_PMD_NOISY
+   if (bsize_before_send > 0) {
+   if (rte_ring_free_count(ncf->f) >= nb_rx) {
+   /* enqueue into fifo */
+   nb_enqd = fifo_put(ncf->f, pkts_burst, nb_rx);
+   if (nb_enqd < nb_rx)
+   nb_rx = nb_enqd;
+   } else {
+   /* fifo is full, dequeue first */
+   nb_deqd = fifo_get(ncf->f, tmp_pkts, nb_rx);
+   /* enqueue into fifo */
+   nb_enqd = fifo_put(ncf->f, pkts_burst, nb_deqd);
+   if (nb_enqd < nb_rx)
+   nb_rx = nb_enqd;
+   if (nb_deqd > 0)
+   nb_tx = rte_eth_tx_burst(fs->tx_port,
+   fs->tx_queue, tmp_pkts,
+   nb_deqd);
+   }
+   } else {
+   nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
+   pkts_burst, nb_rx);
+   }
+
+   /*
+* TX burst queue drain
+*/
+   if (ncf->prev_time == 0) {
+   now = ncf->prev_time = rte_get_timer_cycles();
+   } else {
+   now = rte_get_timer_cycles();
+   }
+   delta_ms = (now - ncf->prev_time) / freq_khz;
+   if (unlikely(delta_ms >= flush_timer) && flush_timer > 0 && (nb_tx == 
0)) {
+   while (fifo_count(ncf->f) > 0) {
+   nb_deqd = fifo_get(ncf->f, tmp_pkts, nb_rx);
+   nb_tx = rte_eth_tx_burst(fs->tx_

[dpdk-dev] [PATCH 2/2] testpmd: add code to simulate noisy neighbour memory usage

2018-04-12 Thread Jens Freimann
Add several options to simulate route lookups (memory reads) in tables
that can be quite large, as well as route hit statistics update.
These options simulates the while stack traversal and
will trash the cache. Memory access is random.

Options to simulate route lookups:

--memory-footprint [size]
Size of the VNF internal memory (MB), in which the random
read/write will be done, allocated by rte_malloc (hugepages).

--nb-rnd-write [num]
Number of random writes in memory per packet should be
performed, simulating hit-flags update. 64 bits per write,
all write in different cache lines.

--nb-rnd-read [num]
Number of random reads in memory per packet should be
performed, simulating FIB/table lookups. 64 bits per read,
all write in different cache lines.

--nb-rnd-read-write [num]
Number of random reads and writes in memory per packet should
be performed, simulating stats update. 64 bits per read-write, all
reads and writes in different cache lines.

Signed-off-by: Jens Freimann 
---
 app/test-pmd/iofwd.c  | 50 +++
 app/test-pmd/parameters.c | 36 ++
 app/test-pmd/testpmd.c| 30 
 app/test-pmd/testpmd.h|  5 +
 4 files changed, 121 insertions(+)

diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 85fa000f7..e69727e2c 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -34,10 +34,57 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "testpmd.h"
 #include "fifo.h"
 
+static inline void
+do_write(char *vnf_mem)
+{
+   uint64_t i = rte_rand();
+   uint64_t w = rte_rand();
+
+   vnf_mem[i % ((vnf_memory_footprint * 1024 * 1024 ) /
+   RTE_CACHE_LINE_SIZE)] = w;
+}
+
+static inline void
+do_read(char *vnf_mem)
+{
+   uint64_t i = rte_rand();
+   uint64_t r = 0;
+
+   r = vnf_mem[i % ((vnf_memory_footprint * 1024 * 1024 ) /
+   RTE_CACHE_LINE_SIZE)];
+   r++;
+}
+
+static inline void
+do_rw(char *vnf_mem)
+{
+   do_read(vnf_mem);
+   do_write(vnf_mem);
+}
+
+/*
+ * Simulate route lookups as defined by commandline parameters
+ */
+static void
+sim_memory_lookups(struct noisy_config *ncf, uint16_t nb_pkts)
+{
+   uint16_t i,j;
+
+   for (i = 0; i < nb_pkts; i++) {
+   for (j = 0; j < nb_rnd_write; j++)
+   do_write(ncf->vnf_mem);
+   for (j = 0; j < nb_rnd_read; j++)
+   do_read(ncf->vnf_mem);
+   for (j = 0; j < nb_rnd_read_write; j++)
+   do_rw(ncf->vnf_mem);
+   }
+}
+
 /*
  * Forwarding of packets in I/O mode.
  * Forward packets "as-is".
@@ -107,6 +154,9 @@ pkt_burst_io_forward(struct fwd_stream *fs)
pkts_burst, nb_rx);
}
 
+   /* simulate noisy vnf by trashing cache lines, simulate route lookups */
+   sim_memory_lookups(ncf, nb_rx);
+
/*
 * TX burst queue drain
 */
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index df0db933a..78e146164 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -623,6 +623,10 @@ launch_args_parse(int argc, char** argv)
{ "tx-offloads",1, 0, 0 },
{ "buffersize-before-sending",  1, 0, 0 },
{ "flush-timer",1, 0, 0 },
+   { "memory-footprint",   1, 0, 0 },
+   { "nb-rnd-write",   1, 0, 0 },
+   { "nb-rnd-read",1, 0, 0 },
+   { "nb-rnd-read-write",  1, 0, 0 },
{ 0, 0, 0, 0 },
};
 
@@ -1120,6 +1124,38 @@ launch_args_parse(int argc, char** argv)
rte_exit(EXIT_FAILURE,
 "flush-timer must be > 0\n");
}
+   if (!strcmp(lgopts[opt_idx].name, "memory-footprint")) {
+   n = atoi(optarg);
+   if (n > 0)
+   vnf_memory_footprint = (uint16_t) n;
+   else
+   rte_exit(EXIT_FAILURE,
+"memory-footprint must be > 
0\n");
+   }
+   if (!strcmp(lgopts[opt_idx].name, "nb-rnd-write")) {
+   n = atoi(optarg);
+   if (n > 0)
+   nb_rnd_write = (uint16_t) n;
+   else
+   rte_exit(EXIT_FAILURE,
+"nb-rnd-write must be > 0\n");
+   }
+   if (!strcmp(lgopts[opt_idx].name, "nb-rnd-read")) {
+   n = atoi(optarg);
+   if (n

Re: [dpdk-dev] [PATCH] eal: add request to map reserved physical memory

2018-04-12 Thread Burakov, Anatoly

On 28-Mar-18 5:51 AM, Ajit Khaparde wrote:

From: Srinath Mannam 

Reserved physical memory is requested from kernel
and it will be mapped to user space.
This memory will be mapped to IOVA using VFIO.
And this memory will be provided to SPDK to allocate
NVMe CQs.

Signed-off-by: Srinath Mannam 
Signed-off-by: Scott Branden 
Signed-off-by: Ajit Khaparde 
---


Hi Srinath,

I've seen this kind of approach implemented before to add additional 
memory types to DPDK (redefining "unused" socket id's to mean something 
else), and i don't like it.


What would be better is to design a new API to support different memory 
types. Some groundwork for this was already laid for this release 
(switching to memseg lists), but more changes will be needed down the 
line. My ideal approach would be to have pluggable memory allocators. 
I've outlined some of my thoughts on this before [1], you're welcome to 
join/continue that discussion, and make sure whatever comes out of it is 
going to be useful for all of us :) I was planning to (attempt to) 
restart that discussion, and this seems like as good an opportunity to 
do that as any other.


Now that the memory hotplug stuff is merged, i'll hopefully get more 
time prototyping.


So, as it is, it's a NACK from me, but let's work together on something 
better :)


[1] http://dpdk.org/ml/archives/dev/2018-February/090937.html

--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v2 01/15] net/mlx5: support 16 hardware priorities

2018-04-12 Thread Xueming(Steven) Li


> -Original Message-
> From: Nélio Laranjeiro 
> Sent: Thursday, April 12, 2018 10:03 PM
> To: Xueming(Steven) Li 
> Cc: Shahaf Shuler ; dev@dpdk.org
> Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
> 
> On Thu, Apr 12, 2018 at 01:43:04PM +, Xueming(Steven) Li wrote:
> >
> >
> > > -Original Message-
> > > From: Nélio Laranjeiro 
> > > Sent: Thursday, April 12, 2018 5:09 PM
> > > To: Xueming(Steven) Li 
> > > Cc: Shahaf Shuler ; dev@dpdk.org
> > > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware
> > > priorities
> > >
> > > On Tue, Apr 10, 2018 at 03:22:46PM +, Xueming(Steven) Li wrote:
> > > > Hi Nelio,
> > > >
> > > > > -Original Message-
> > > > > From: Nélio Laranjeiro 
> > > > > Sent: Tuesday, April 10, 2018 10:42 PM
> > > > > To: Xueming(Steven) Li 
> > > > > Cc: Shahaf Shuler ; dev@dpdk.org
> > > > > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware
> > > > > priorities
> > > > >
> > > > > On Tue, Apr 10, 2018 at 09:34:01PM +0800, Xueming Li wrote:
> > > > > > Adjust flow priority mapping to adapt new hardware 16 verb
> > > > > > flow priorites support:
> > > > > > 0-3: RTE FLOW tunnel rule
> > > > > > 4-7: RTE FLOW non-tunnel rule
> > > > > > 8-15: PMD control flow
> > > > >
> > > > > This commit log is inducing people in error, this amount of
> > > > > priority depends on the Mellanox OFED installed, it is not
> > > > > available on upstream Linux kernel yet nor in the current Mellanox
> OFED GA.
> > > > >
> > > > > What happens when those amount of priority are not available, is
> > > > > it removing a functionality?  Will it collide with other flows?
> > > >
> > > > If 16  priorities not available, simply behavior as 8 priorities.
> > >
> > > It is not described in the commit log, please add it.
> > >
> > > > > > Signed-off-by: Xueming Li 
> > > 
> > > > > > },
> > > > > > [HASH_RXQ_ETH] = {
> > > > > > .hash_fields = 0,
> > > > > > .dpdk_rss_hf = 0,
> > > > > > -   .flow_priority = 3,
> > > > > > +   .flow_priority = 2,
> > > > > > },
> > > > > >  };
> > > > >
> > > > > If the amount of priorities remains 8, you are removing the
> > > > > priority for the tunnel flows introduced by commit 749365717f5c
> ("net/mlx5:
> > > > > change tunnel flow priority")
> > > > >
> > > > > Please keep this functionality when this patch fails to get the
> > > > > expected
> > > > > 16 Verbs priorities.
> > > >
> > > > These priority shift are different in 16 priorities scenario, I
> > > > changed it to calculation. In function
> > > > mlx5_flow_priorities_detect(), priority shift will be 1 if 8
> priorities, 4 in case of 16 priorities.
> > > > Please refer to changes in function mlx5_flow_update_priority() as
> well.
> > >
> > > Please light my lamp, I don't see it...
> >
> > Sorry, please refer to priv->config.flow_priority_shift.
> >
> > >
> > > 
> > > > > >  static void
> > > > > > -mlx5_flow_update_priority(struct mlx5_flow_parse *parser,
> > > > > > +mlx5_flow_update_priority(struct rte_eth_dev *dev,
> > > > > > + struct mlx5_flow_parse *parser,
> > > > > >   const struct rte_flow_attr *attr)  {
> > > > > > +   struct priv *priv = dev->data->dev_private;
> > > > > > unsigned int i;
> > > > > > +   uint16_t priority;
> > > > > >
> > > > > > +   if (priv->config.flow_priority_shift == 1)
> > > > > > +   priority = attr->priority * MLX5_VERBS_FLOW_PRIO_4;
> > > > > > +   else
> > > > > > +   priority = attr->priority * MLX5_VERBS_FLOW_PRIO_8;
> > > > > > +   if (!parser->inner)
> > > > > > +   priority += priv->config.flow_priority_shift;
> >
> > Here, if non-tunnel flow, lower(increase) 1 for 8 priorities, lower 4
> otherwise.
> > I'll append a comment here.
> 
> Thanks, I totally missed this one.
> 
> 
> > > > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c
> > > > > > b/drivers/net/mlx5/mlx5_trigger.c index 6bb4ffb14..d80a2e688
> > > > > > 100644
> > > > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > > > @@ -148,12 +148,6 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > > int ret;
> > > > > >
> > > > > > dev->data->dev_started = 1;
> > > > > > -   ret = mlx5_flow_create_drop_queue(dev);
> > > > > > -   if (ret) {
> > > > > > -   DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
> > > > > > -   dev->data->port_id, strerror(rte_errno));
> > > > > > -   goto error;
> > > > > > -   }
> > > > > > DRV_LOG(DEBUG, "port %u allocating and configuring hash Rx
> > > queues",
> > > > > > dev->data->port_id);
> > > > > > rte_mempool_walk(mlx5_mp2mr_iter, priv); @@ -202,7 +196,6 @@
> > > > > > mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > > mlx5_traffic_disable(dev);
> > > > > > mlx5_txq_stop(dev);
> > > > > > mlx5_rxq_stop(dev);
> > > > > > -   mlx5_flow_delete_drop_queue(dev);
> > > > 

Re: [dpdk-dev] [PATCH 1/2] testpmd: add parameters buffersize-before-send and flush-timeout

2018-04-12 Thread Ananyev, Konstantin
Hi,

> 
> Create a fifo to buffer received packets. Once it flows over put
> those packets into the actual tx queue. The fifo is created per tx
> queue and its size can be set with the --buffersize-before-sending
> commandline parameter.
> 
> A second commandline parameter is used to set a timeout in
> milliseconds after which the fifo is flushed.
> 
> --buffersize-before-sending [packet numbers]
> Keep the mbuf in a FIFO and forward the over flooding packets from the
> FIFO. This queue is per TX-queue (after all other packet processing).
> 
> --flush-timer [delay]
> Flush the packet queue if no packets have been seen during
> [delay]. As long as packets are seen, the timer is reset.
> 

I understand your desire to have some realistic fwd scenario,
but why it all have to be put in iowfd mode?
 iowfd is the simplest one, mainly used to test raw PMD pefomance
in nearly ideal conditions.
Why not to create your own forwarding mode (as most people do)?
That way you'll have your 'real world app' test scenario,
while keeping iofwd code small and simple.
Konstantin 

> Signed-off-by: Jens Freimann 
> ---
>  app/test-pmd/fifo.h   | 43 ++
>  app/test-pmd/iofwd.c  | 59 
> ++-
>  app/test-pmd/parameters.c | 21 -
>  app/test-pmd/testpmd.c| 48 ++
>  app/test-pmd/testpmd.h| 15 
>  config/common_base|  1 +
>  6 files changed, 185 insertions(+), 2 deletions(-)
>  create mode 100644 app/test-pmd/fifo.h
> 
> diff --git a/app/test-pmd/fifo.h b/app/test-pmd/fifo.h
> new file mode 100644
> index 0..01415f98c
> --- /dev/null
> +++ b/app/test-pmd/fifo.h
> @@ -0,0 +1,43 @@
> +#ifndef __FIFO_H
> +#define __FIFO_H
> +#include 
> +#include 
> +#include 
> +#include "testpmd.h"
> +
> +#define FIFO_COUNT_MAX 1024
> +/**
> + * Add elements to fifo. Return number of written elements
> + */
> +static inline unsigned
> +fifo_put(struct rte_ring *r, struct rte_mbuf **data, unsigned num)
> +{
> +
> + return rte_ring_enqueue_burst(r, (void **)data, num, NULL);
> +}
> +
> +/**
> + * Get elements from fifo. Return number of read elements
> + */
> +static inline unsigned
> +fifo_get(struct rte_ring *r, struct rte_mbuf **data, unsigned num)
> +{
> + return rte_ring_dequeue_burst(r, (void **) data, num, NULL);
> +}
> +
> +static inline unsigned
> +fifo_count(struct rte_ring *r)
> +{
> + return rte_ring_count(r);
> +}
> +
> +static inline int
> +fifo_full(struct rte_ring *r)
> +{
> + return rte_ring_full(r);
> +}
> +
> +
> +struct rte_ring * fifo_init(uint32_t qi, uint32_t pi);
> +
> +#endif
> diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
> index 9dce76efe..85fa000f7 100644
> --- a/app/test-pmd/iofwd.c
> +++ b/app/test-pmd/iofwd.c
> @@ -36,6 +36,7 @@
>  #include 
> 
>  #include "testpmd.h"
> +#include "fifo.h"
> 
>  /*
>   * Forwarding of packets in I/O mode.
> @@ -48,7 +49,7 @@ pkt_burst_io_forward(struct fwd_stream *fs)
>  {
>   struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
>   uint16_t nb_rx;
> - uint16_t nb_tx;
> + uint16_t nb_tx = 0;
>   uint32_t retry;
> 
>  #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
> @@ -56,6 +57,15 @@ pkt_burst_io_forward(struct fwd_stream *fs)
>   uint64_t end_tsc;
>   uint64_t core_cycles;
>  #endif
> +#ifdef RTE_TEST_PMD_NOISY
> + const uint64_t freq_khz = rte_get_timer_hz() / 1000;
> + struct noisy_config *ncf = &noisy_cfg[fs->tx_queue];
> + struct rte_mbuf *tmp_pkts[MAX_PKT_BURST];
> + uint16_t nb_enqd;
> + uint16_t nb_deqd = 0;
> + uint64_t delta_ms;
> + uint64_t now;
> +#endif
> 
>  #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
>   start_tsc = rte_rdtsc();
> @@ -73,8 +83,55 @@ pkt_burst_io_forward(struct fwd_stream *fs)
>  #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
>   fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
>  #endif
> +#ifdef RTE_TEST_PMD_NOISY
> + if (bsize_before_send > 0) {
> + if (rte_ring_free_count(ncf->f) >= nb_rx) {
> + /* enqueue into fifo */
> + nb_enqd = fifo_put(ncf->f, pkts_burst, nb_rx);
> + if (nb_enqd < nb_rx)
> + nb_rx = nb_enqd;
> + } else {
> + /* fifo is full, dequeue first */
> + nb_deqd = fifo_get(ncf->f, tmp_pkts, nb_rx);
> + /* enqueue into fifo */
> + nb_enqd = fifo_put(ncf->f, pkts_burst, nb_deqd);
> + if (nb_enqd < nb_rx)
> + nb_rx = nb_enqd;
> + if (nb_deqd > 0)
> + nb_tx = rte_eth_tx_burst(fs->tx_port,
> + fs->tx_queue, tmp_pkts,
> + nb_deqd);
> + }
> + } else {
> + nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
> +  

Re: [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated

2018-04-12 Thread Burakov, Anatoly

On 26-Mar-18 5:09 PM, Andrew Rybchenko wrote:

Size of memory chunk required to populate mempool objects depends
on how objects are stored in the memory. Different mempool drivers
may have different requirements and a new operation allows to
calculate memory size in accordance with driver requirements and
advertise requirements on minimum memory chunk size and alignment
in a generic way.

Bump ABI version since the patch breaks it.

Suggested-by: Olivier Matz 
Signed-off-by: Andrew Rybchenko 
---


Hi Andrew,

<...>


-   total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
-   size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift,
-   mp->flags);
+   size_t min_chunk_size;
+
+   mem_size = rte_mempool_ops_calc_mem_size(mp, n, pg_shift,
+   &min_chunk_size, &align);
+   if (mem_size < 0) {
+   ret = mem_size;
+   goto fail;
+   }
  
  		ret = snprintf(mz_name, sizeof(mz_name),

RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
@@ -606,7 +600,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
goto fail;
}
  
-		mz = rte_memzone_reserve_aligned(mz_name, size,

+   mz = rte_memzone_reserve_aligned(mz_name, mem_size,
mp->socket_id, mz_flags, align);
/* not enough memory, retry with the biggest zone we have */
if (mz == NULL)
@@ -617,6 +611,12 @@ rte_mempool_populate_default(struct rte_mempool *mp)
goto fail;
}
  
+		if (mz->len < min_chunk_size) {

+   rte_memzone_free(mz);
+   ret = -ENOMEM;
+   goto fail;
+   }
+
if (mp->flags & MEMPOOL_F_NO_IOVA_CONTIG)
iova = RTE_BAD_IOVA;


OK by me, but needs to be rebased.


else
@@ -649,13 +649,14 @@ rte_mempool_populate_default(struct rte_mempool *mp)
  static size_t
  get_anon_size(const struct rte_mempool *mp)
  {
-   size_t size, total_elt_sz, pg_sz, pg_shift;
+   size_t size, pg_sz, pg_shift;
+   size_t min_chunk_size;
+   size_t align;
  
  	pg_sz = getpagesize();


<...>

  
+/**

+ * Calculate memory size required to store given number of objects.
+ *
+ * If mempool objects are not required to be IOVA-contiguous
+ * (the flag MEMPOOL_F_NO_IOVA_CONTIG is set), min_chunk_size defines
+ * virtually contiguous chunk size. Otherwise, if mempool objects must
+ * be IOVA-contiguous (the flag MEMPOOL_F_NO_IOVA_CONTIG is clear),
+ * min_chunk_size defines IOVA-contiguous chunk size.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location for required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
+   uint32_t obj_num,  uint32_t pg_shift,
+   size_t *min_chunk_size, size_t *align);
+
+/**
+ * Default way to calculate memory size required to store given number of
+ * objects.
+ *
+ * If page boundaries may be ignored, it is just a product of total
+ * object size including header and trailer and number of objects.
+ * Otherwise, it is a number of pages required to store given number of
+ * objects without crossing page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * If mempool driver requires object addresses to be block size aligned
+ * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
+ * reserved to be able to meet the requirement.
+ *
+ * Minimum size of memory chunk is either all required space, if
+ * capabilities say that whole memory area must be physically contiguous
+ * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * element size.
+ *
+ * Required memory chunk alignment is a maximum of page size and cache
+ * line size.
+ */
+ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+   uint32_t obj_num, uint32_t pg_shift,
+   size_t *min_chunk_size, size_t *align);


For API docs and wording,

Acked-by: Anatoly Burakov 

Should be pretty straightforward to rebase, so you probably should keep 
my ack for v4.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v6] vfio: change to use generic multi-process channel

2018-04-12 Thread Burakov, Anatoly

On 20-Mar-18 8:50 AM, Jianfeng Tan wrote:

Previously, vfio uses its own private channel for the secondary
process to get container fd and group fd from the primary process.

This patch changes to use the generic mp channel.

Test:
   1. Bind two NICs to vfio-pci.

   2. Start the primary and secondary process.
 $ (symmetric_mp) -c 2 -- -p 3 --num-procs=2 --proc-id=0
 $ (symmetric_mp) -c 4 --proc-type=auto -- -p 3 \
--num-procs=2 --proc-id=1

Cc: anatoly.bura...@intel.com

Signed-off-by: Jianfeng Tan 
---
v5->v6: (Address comments from Anatoly)
   - Naming, return checking, logging.
   - Move vfio action register after rte_bus_probe().


Acked-by: Anatoly Burakov 

--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v6 1/4] eal/vfio: add multiple container support

2018-04-12 Thread Wang, Xiao W
Hi Anatoly,

> -Original Message-
> From: Burakov, Anatoly
> Sent: Thursday, April 12, 2018 10:04 PM
> To: Wang, Xiao W ; Yigit, Ferruh
> 
> Cc: dev@dpdk.org; maxime.coque...@redhat.com; Wang, Zhihong
> ; Bie, Tiwei ; Tan, Jianfeng
> ; Liang, Cunming ; Daly,
> Dan ; tho...@monjalon.net; gaetan.ri...@6wind.com;
> hemant.agra...@nxp.com; Chen, Junjie J 
> Subject: Re: [PATCH v6 1/4] eal/vfio: add multiple container support
> 
> On 12-Apr-18 8:19 AM, Xiao Wang wrote:
> > Currently eal vfio framework binds vfio group fd to the default
> > container fd during rte_vfio_setup_device, while in some cases,
> > e.g. vDPA (vhost data path acceleration), we want to put vfio group
> > to a separate container and program IOMMU via this container.
> >
> > This patch adds some APIs to support container creating and device
> > binding with a container.
> >
> > A driver could use "rte_vfio_create_container" helper to create a
> > new container from eal, use "rte_vfio_bind_group" to bind a device
> > to the newly created container.
> >
> > During rte_vfio_setup_device, the container bound with the device
> > will be used for IOMMU setup.
> >
> > Signed-off-by: Junjie Chen 
> > Signed-off-by: Xiao Wang 
> > Reviewed-by: Maxime Coquelin 
> > Reviewed-by: Ferruh Yigit 
> > ---
> 
> Apologies for late review. Some comments below.
> 
> <...>
> 
> >
> > +struct rte_memseg;
> > +
> >   /**
> >* Setup vfio_cfg for the device identified by its address.
> >* It discovers the configured I/O MMU groups or sets a new one for the
> device.
> > @@ -131,6 +133,117 @@ rte_vfio_clear_group(int vfio_group_fd);
> >   }
> >   #endif
> >
> 
> <...>
> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Perform dma mapping for devices in a conainer.
> > + *
> > + * @param container_fd
> > + *   the specified container fd
> > + *
> > + * @param dma_type
> > + *   the dma map type
> > + *
> > + * @param ms
> > + *   the dma address region to map
> > + *
> > + * @return
> > + *0 if successful
> > + *   <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_dma_map(int container_fd, int dma_type, const struct
> rte_memseg *ms);
> > +
> 
> First of all, why memseg, instead of va/iova/len? This seems like
> unnecessary attachment to internals of DPDK memory representation. Not
> all memory comes in memsegs, this makes the API unnecessarily specific
> to DPDK memory.

Agree, will use va/iova/len.

> 
> Also, why providing DMA type? There's already a VFIO type pointer in
> vfio_config - you can set this pointer for every new created container,
> so the user wouldn't have to care about IOMMU type. Is it not possible
> to figure out DMA type from within EAL VFIO? If not, maybe provide an
> API to do so, e.g. rte_vfio_container_set_dma_type()?

It's possible, EAL VFIO should be able to figure out a container's DMA type.

> 
> This will also need to be rebased on top of latest HEAD because there
> already is a similar DMA map/unmap API added, only without the container
> parameter. Perhaps rename these new functions to
> rte_vfio_container_(create|destroy|dma_map|dma_unmap)?

OK, will check the latest HEAD and rebase on that.

> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Perform dma unmapping for devices in a conainer.
> > + *
> > + * @param container_fd
> > + *   the specified container fd
> > + *
> > + * @param dma_type
> > + *the dma map type
> > + *
> > + * @param ms
> > + *   the dma address region to unmap
> > + *
> > + * @return
> > + *0 if successful
> > + *   <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_dma_unmap(int container_fd, int dma_type, const struct
> rte_memseg *ms);
> > +
> >   #endif /* VFIO_PRESENT */
> >
> 
> <...>
> 
> > @@ -75,8 +53,8 @@ vfio_get_group_fd(int iommu_group_no)
> > if (vfio_group_fd < 0) {
> > /* if file not found, it's not an error */
> > if (errno != ENOENT) {
> > -   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> filename,
> > -   strerror(errno));
> > +   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> > +   filename, strerror(errno));
> 
> This looks like unintended change.
> 
> > return -1;
> > }
> >
> > @@ -86,8 +64,10 @@ vfio_get_group_fd(int iommu_group_no)
> > vfio_group_fd = open(filename, O_RDWR);
> > if (vfio_group_fd < 0) {
> > if (errno != ENOENT) {
> > -   RTE_LOG(ERR, EAL, "Cannot
> open %s: %s\n", filename,
> > -   strerror(errno));
> > +   RTE_LOG(ERR, EAL,
> > +   "Ca

Re: [dpdk-dev] [PATCH v2 2/3] net/szedata2: add support for new NIC

2018-04-12 Thread Ferruh Yigit
On 4/12/2018 8:41 AM, Matej Vido wrote:
> + if (pci_dev->id.device_id == PCI_DEVICE_ID_NETCOPE_NFB200G2QL) {
> + unsigned int i;
> + unsigned int rx_queues = max_rx_queues / max_ports;
> + unsigned int tx_queues = max_tx_queues / max_ports;
> +
> + /*
> +  * Number of queues reported by szedata_ifaces_available()
> +  * is the number of all queues from all DMA controllers which
> +  * may reside at different numa locations.
> +  * All queues from the same DMA controller have the same numa
> +  * node.
> +  * Numa node from the first queue of each DMA controller is
> +  * retrieved.
> +  * If the numa node differs from the numa node of the queues
> +  * from the previous DMA controller the queues are assigned
> +  * to the next port.
> +  */
> +
> + for (i = 0; i < max_ports; i++) {
> + int numa_rx = szedata_get_area_numa_node(szedata_temp,
> + SZE2_DIR_RX, rx_queues * i);
> + int numa_tx = szedata_get_area_numa_node(szedata_temp,

Hi Matej,

Where szedata_get_area_numa_node() is defined?
Is it possible that you are missing a patch?

Thanks,
ferruh


Re: [dpdk-dev] [PATCH] net/tap: remove queue specific offload support

2018-04-12 Thread Ferruh Yigit
On 4/5/2018 6:49 PM, Thomas Monjalon wrote:
> Pascal, Moti, Ophir,
> please comment.

Hi Moti,

Any comment? This has been asked many times now.

> 
> 22/03/2018 19:28, Ferruh Yigit:
>> It is not clear if tap PMD supports queue specific offloads, removing
>> the related code.
>>
>> Fixes: 95ae196ae10b ("net/tap: use new Rx offloads API")
>> Fixes: 818fe14a9891 ("net/tap: use new Tx offloads API")
>> Cc: mo...@mellanox.com
>>
>> Signed-off-by: Ferruh Yigit 
> 
> 
> 



Re: [dpdk-dev] [PATCH v6 1/4] eal/vfio: add multiple container support

2018-04-12 Thread Burakov, Anatoly

On 12-Apr-18 5:07 PM, Wang, Xiao W wrote:

Hi Anatoly,



<...>



Also, why providing DMA type? There's already a VFIO type pointer in
vfio_config - you can set this pointer for every new created container,
so the user wouldn't have to care about IOMMU type. Is it not possible
to figure out DMA type from within EAL VFIO? If not, maybe provide an
API to do so, e.g. rte_vfio_container_set_dma_type()?


It's possible, EAL VFIO should be able to figure out a container's DMA type.


You probably won't be able to do it until you add a group into the 
container, so probably best place to do it would be on group_bind?


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH 1/2] testpmd: add parameters buffersize-before-send and flush-timeout

2018-04-12 Thread Ferruh Yigit
On 4/12/2018 3:57 PM, Ananyev, Konstantin wrote:
> Hi,
> 
>>
>> Create a fifo to buffer received packets. Once it flows over put
>> those packets into the actual tx queue. The fifo is created per tx
>> queue and its size can be set with the --buffersize-before-sending
>> commandline parameter.
>>
>> A second commandline parameter is used to set a timeout in
>> milliseconds after which the fifo is flushed.
>>
>> --buffersize-before-sending [packet numbers]
>> Keep the mbuf in a FIFO and forward the over flooding packets from the
>> FIFO. This queue is per TX-queue (after all other packet processing).
>>
>> --flush-timer [delay]
>> Flush the packet queue if no packets have been seen during
>> [delay]. As long as packets are seen, the timer is reset.
>>
> 
> I understand your desire to have some realistic fwd scenario,
> but why it all have to be put in iowfd mode?
>  iowfd is the simplest one, mainly used to test raw PMD pefomance
> in nearly ideal conditions.
> Why not to create your own forwarding mode (as most people do)?
> That way you'll have your 'real world app' test scenario,
> while keeping iofwd code small and simple.

+1 to having own forwarding mode for noisy neighbor, and leaving iofwd simple.

> Konstantin 
> 
>> Signed-off-by: Jens Freimann 

<...>


Re: [dpdk-dev] [PATCH v2 1/6] mbuf: add buffer offset field for flexible indirection

2018-04-12 Thread Ananyev, Konstantin
> >
> > > > >
> > > > > On Mon, Apr 09, 2018 at 06:04:34PM +0200, Olivier Matz wrote:
> > > > > > Hi Yongseok,
> > > > > >
> > > > > > On Tue, Apr 03, 2018 at 05:12:06PM -0700, Yongseok Koh wrote:
> > > > > > > On Tue, Apr 03, 2018 at 10:26:15AM +0200, Olivier Matz wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On Mon, Apr 02, 2018 at 11:50:03AM -0700, Yongseok Koh wrote:
> > > > > > > > > When attaching a mbuf, indirect mbuf has to point to start of 
> > > > > > > > > buffer of
> > > > > > > > > direct mbuf. By adding buf_off field to rte_mbuf, this 
> > > > > > > > > becomes more
> > > > > > > > > flexible. Indirect mbuf can point to any part of direct mbuf 
> > > > > > > > > by calling
> > > > > > > > > rte_pktmbuf_attach_at().
> > > > > > > > >
> > > > > > > > > Possible use-cases could be:
> > > > > > > > > - If a packet has multiple layers of encapsulation, multiple 
> > > > > > > > > indirect
> > > > > > > > >   buffers can reference different layers of the encapsulated 
> > > > > > > > > packet.
> > > > > > > > > - A large direct mbuf can even contain multiple packets in 
> > > > > > > > > series and
> > > > > > > > >   each packet can be referenced by multiple mbuf indirections.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Yongseok Koh 
> > > > > > > >
> > > > > > > > I think the current API is already able to do what you want.
> > > > > > > >
> > > > > > > > 1/ Here is a mbuf m with its data
> > > > > > > >
> > > > > > > >off
> > > > > > > ><-->
> > > > > > > >   len
> > > > > > > >   ++   <-->
> > > > > > > >   ||
> > > > > > > > +-|v--+
> > > > > > > > | |---|
> > > > > > > > m   | buf  |XXX  ||
> > > > > > > > |  ---|
> > > > > > > > +-+
> > > > > > > >
> > > > > > > >
> > > > > > > > 2/ clone m:
> > > > > > > >
> > > > > > > >   c = rte_pktmbuf_alloc(pool);
> > > > > > > >   rte_pktmbuf_attach(c, m);
> > > > > > > >
> > > > > > > >   Note that c has its own offset and length fields.
> > > > > > > >
> > > > > > > >
> > > > > > > >off
> > > > > > > ><-->
> > > > > > > >   len
> > > > > > > >   ++   <-->
> > > > > > > >   ||
> > > > > > > > +-|v--+
> > > > > > > > | |---|
> > > > > > > > m   | buf  |XXX  ||
> > > > > > > > |  ---|
> > > > > > > > +--^--+
> > > > > > > >|
> > > > > > > >   ++
> > > > > > > > indirect  |
> > > > > > > > +-|---+
> > > > > > > > | |---|
> > > > > > > > c   | buf  | ||
> > > > > > > > |  ---|
> > > > > > > > +-+
> > > > > > > >
> > > > > > > > offlen
> > > > > > > > <--><-->
> > > > > > > >
> > > > > > > >
> > > > > > > > 3/ remove some data from c without changing m
> > > > > > > >
> > > > > > > >rte_pktmbuf_adj(c, 10)   // at head
> > > > > > > >rte_pktmbuf_trim(c, 10)  // at tail
> > > > > > > >
> > > > > > > >
> > > > > > > > Please let me know if it fits your needs.
> > > > > > >
> > > > > > > No, it doesn't.
> > > > > > >
> > > > > > > Trimming head and tail with the current APIs removes data and 
> > > > > > > make the space
> > > > > > > available. Adjusting packet head means giving more headroom, not 
> > > > > > > shifting the
> > > > > > > buffer itself. If m has two indirect mbufs (c1 and c2) and those 
> > > > > > > are pointing to
> > > > > > > difference offsets in m,
> > > > > > >
> > > > > > > rte_pktmbuf_adj(c1, 10);
> > > > > > > rte_pktmbuf_adj(c2, 20);
> > > > > > >
> > > > > > > then the owner of c2 regard the first (off+20)B as available 
> > > > > > > headroom. If it
> > > > > > > wants to attach outer header, it will overwrite the headroom even 
> > > > > > > though the
> > > > > > > owner of c1 is still accessing it. Instead, another mbuf (h1) for 
> > > > > > > the outer
> > > > > > > header should be linked by h1->next = c2.
> > > > > >
> > > > > > Yes, after these operations c1, c2 and m should become read-only. 
> > > > > > So, to
> > > > > > prepend headers, another mbuf has to be inserted before as you 
> > > > > > suggest. It
> > > > > > is possible to wrap this in a function rte_pktmbuf_clone_area(m, 
> > > > > > offset,
> > > > > > length) that will:
> > > > > >   - alloc and attach indirect mbuf for each segment of m that is
> > > > > > in the range [offset : length+offset].
> > > > > >   - prepend an empty and writable mbuf for the headers
> > > > > >
> > > > > > > If c1 and c2 are attached with shifting buffer address

Re: [dpdk-dev] [PATCH] net/bonding: add rte flow support

2018-04-12 Thread Ferruh Yigit
On 3/28/2018 12:16 PM, Matan Azrad wrote:
> Ethernet devices which are grouped by bonding PMD, aka slaves, are
> sharing the same queues and RSS configurations and their Rx burst
> functions must be managed by the bonding PMD according to the bonding
> architectuer.
> 
> So, it makes sense to configure the same flow rules for all the bond
> slaves to allow consistency in packet flow management.
> 
> Add rte flow support to the bonding PMD to manage all flow
> configuration to the bonded slaves.
> 
> Signed-off-by: Matan Azrad 

Hi Declan, Radu,

Any comment on the patch?

Thanks,
ferruh


Re: [dpdk-dev] [PATCH v7 0/2] app/testpmd: add new commands to test new Tx/Rx offloads

2018-04-12 Thread Ferruh Yigit
On 4/3/2018 9:57 AM, Wei Dai wrote:
> Existed testpmd commands can't support per queue offload configuration.
> And there are different commands to enable or disable different offloading.
> This patch set add following commands to support new Tx/Rx offloading API 
> test.
> 
> To get Rx offload capability of a port, please run:
> testpmd > rx_offload get capability 
> 
> To get current Rx offload per queue and per port configuration of a port, run:
> tesstpmd > rx_offload get configuration 
> 
> To enable or disable a Rx per port offloading, please run:
> testpmd > rx_offload enable|disable per_port vlan_strip|ipv4_cksum|... 
> 
> This command will set|clear the associated bit in 
> dev->dev_conf.rxmode.offloads
> for rte_eth_dev_configure and tx_conf->offloads of all Rx queues for
> rte_eth_rx_queue_setup( ).
> 
> To enable or disable a Tx per port offloading, please run:
> testpmd > rx_offload enable|disable per_queue vlan_strip|ipv4_cksum|... 
>  


Hi Wei,

When each feature adds its own command testpmd becomes harder to use and
commands get harder to remember.

I am against adding a new set of "[rt]x_offload" high level commands. This a
feature of ports and should be a sub-command of port commands.

>From scope of this patch it is hard to see the problem, but that becomes more
clear as you look into whole testpmd command line.

There is already a command
"port config  ...",
"show port <...> "

so we can re-use them like:
"show port rx_offload_cap "

There is already "show port cap " to get configured offloads!

"port config  rx_offload vlan_strip|ipv4_cksum|... on|off"
"port config  queue  rx_offload vlan_strip|ipv4_cksum|... 
on|off"


or something similar but main idea is lets not create a new command, what do you
think?

> 
> Same commands like "tx_offload ..." are also added to support new Tx offload 
> API test.
> 
> Signed-off-by: Wei Dai 
> Acked-by: Jingjing Wu 
> 
> ---
> v7:
>update testpmd document
> v6:
>reconfig port and queues if offloading is enabled or disabled
> v5:
>don't depend on enum types defined in rte_ethdev.
> v4:
>improve testpmd command per port offload to set or clear the port 
> configuration
>and the queue configuration of all queues.
> v3:
>add enum rte_eth_rx_offload_type and enum rte_eth_tx_offload_type
>free memory of port->rx_offloads and port->tx_offloads when testpmd is 
> existed
> v2:
>use rte_eth_dev_rx_offload_name() and rte_eth_dev_tx_offload_name().
>remove static const strings of Rx/Tx offload names.
> 
> 
> Wei Dai (2):
>   app/testpmd: add commands to test new Rx offload API
>   app/testpmd: add commands to test new Tx offload API
> 
>  app/test-pmd/cmdline.c  | 759 
> 
>  app/test-pmd/testpmd.c  |  34 +-
>  app/test-pmd/testpmd.h  |   2 +
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  87 
>  4 files changed, 878 insertions(+), 4 deletions(-)
> 



Re: [dpdk-dev] [PATCH dpdk-next-net] net/axgbe: fix an assignment error in axgbe_dev_info_get()

2018-04-12 Thread Ferruh Yigit
On 4/9/2018 3:02 PM, Ferruh Yigit wrote:
> On 4/9/2018 2:56 PM, Kumar, Ravi1 wrote:
>>> This patch fixes a tirvial error in assigning max Rx/Tx queues in 
>>> axgbe_dev_info_get() of the axgbe PMD driver. 
>>>
>>> Signed-off-by: Rami Rosen 
>>> ---
>>> drivers/net/axgbe/axgbe_ethdev.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/axgbe/axgbe_ethdev.c 
>>> b/drivers/net/axgbe/axgbe_ethdev.c
>>> index 07c1337ac..a9a9fb570 100644
>>> --- a/drivers/net/axgbe/axgbe_ethdev.c
>>> +++ b/drivers/net/axgbe/axgbe_ethdev.c
>>> @@ -355,8 +355,8 @@ axgbe_dev_info_get(struct rte_eth_dev *dev,
>>> struct axgbe_port *pdata = dev->data->dev_private;
>>>
>>> dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
>>> -   dev_info->max_rx_queues = pdata->tx_ring_count;
>>> -   dev_info->max_tx_queues = pdata->rx_ring_count;
>>> +   dev_info->max_rx_queues = pdata->rx_ring_count;
>>> +   dev_info->max_tx_queues = pdata->tx_ring_count;
>>> dev_info->min_rx_bufsize = AXGBE_RX_MIN_BUF_SIZE;
>>> dev_info->max_rx_pktlen = AXGBE_RX_MAX_BUF_SIZE;
>>> dev_info->max_mac_addrs = AXGBE_MAX_MAC_ADDRS;
>>> --
>>> 2.14.3
>>>
>>
>> Thanks a lot Remi. Wonderful catch. 
> 
> I am adding your explicit ack for the patch:
> Acked-by: Ravi Kumar 

Squashed into relevant commit in next-net, thanks.


Re: [dpdk-dev] [PATCH] net/sfc: use default FEC mode

2018-04-12 Thread Ferruh Yigit
On 4/10/2018 1:48 PM, Andrew Rybchenko wrote:
> All FEC modes are supported and allowed, but none are explicitly
> requested.
> 
> This effectively means that FEC mode is determined solely form cable
> requirements and link partner capabilities / requirements.
> 
> Signed-off-by: Andrew Rybchenko 

Applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [PATCH v2 0/2] Support for new Ethdev offload APIs

2018-04-12 Thread Ferruh Yigit
On 4/11/2018 12:05 PM, Sunil Kumar Kori wrote:
> Patchset contains changes to support ethdev offload APIs for DPAA and DPAA2
> drivers.
> 
> Offloading support is categoriesed in following logical parts:
> 1. If requested offloading features is not supported then returned error.
> 2. If requested offloading feature is supoorted but cannot be disabled then
>request to disable the offload is silently discarded with a message.
> 3. Otherwise configuration is succesfully offloaded
> 
> [Changes in v2]
> 1. Incorporated review comments.
> 
> Sunil Kumar Kori (2):
>   net/dpaa: Changes to support ethdev offload APIs
>   net/dpaa2: Changes to support ethdev offload APIs

Series applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [PATCH] net/nfp: support LSO offload version 2

2018-04-12 Thread Ferruh Yigit
On 4/11/2018 11:33 AM, Alejandro Lucero wrote:
> This new LSO offload version facilitates how firmware implements
> this functionality and helps improving the performance.
> 
> Signed-off-by: Alejandro Lucero 

Applied to dpdk-next-net/master, thanks.


[dpdk-dev] virtio: rte_ethdev port_ids consumed by rte_eal_init()

2018-04-12 Thread Dennis Montgomery
Hi,

I've run into a problem with DPDK v17.08, when built with
CONFIG_RTE_LIBRTE_VIRTIO_PMD=y.  After rte_eal_init() calls rte_bus_probe()
we end up with the first several entries (matching the number of virtio-pci
eth devices on the system) in rte_eth_devices[] showing state ==
RTE_ETH_DEV_ATTACHED.  Subsequently when we try to allocate a new device
(via rte_eth_dev_attach()), rte_eth_dev_find_free_port() returns a nonzero
number the for the first device we really want to use.  The device
functions properly so this isn't too much of a problem, but effectively it
limits the number of virtio eth devices to be half of RTE_MAX_ETHPORTS.

I'd greatly appreciate some guidance on how to work around this - i.e.
whether it's fixed in a newer release or if the structures filled in by
rte_bus_probe() can be started without attaching, or whatever.

Thanks in advance,

Dennis Montgomery


Re: [dpdk-dev] [PATCH] net/nfp: add support for hardware RSS v2

2018-04-12 Thread Ferruh Yigit
On 4/11/2018 2:10 PM, Alejandro Lucero wrote:
> hained metadata instead of prepend metadata was added in
> firmware version 4. However, it could be old firmwares evolving
> but not supporting chained metadata.
> 
> This patch adds support for an old firmware being updated and
> getting a firmware version number higher than 4, but it still not
> implementing chained metadata.
> 
> Signed-off-by: Alejandro Lucero 

Applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [PATCH v2 1/6] mbuf: add buffer offset field for flexible indirection

2018-04-12 Thread Yongseok Koh
On Thu, Apr 12, 2018 at 04:34:56PM +, Ananyev, Konstantin wrote:
> > >
> > > > > >
> > > > > > On Mon, Apr 09, 2018 at 06:04:34PM +0200, Olivier Matz wrote:
> > > > > > > Hi Yongseok,
> > > > > > >
> > > > > > > On Tue, Apr 03, 2018 at 05:12:06PM -0700, Yongseok Koh wrote:
> > > > > > > > On Tue, Apr 03, 2018 at 10:26:15AM +0200, Olivier Matz wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On Mon, Apr 02, 2018 at 11:50:03AM -0700, Yongseok Koh wrote:
> > > > > > > > > > When attaching a mbuf, indirect mbuf has to point to start 
> > > > > > > > > > of buffer of
> > > > > > > > > > direct mbuf. By adding buf_off field to rte_mbuf, this 
> > > > > > > > > > becomes more
> > > > > > > > > > flexible. Indirect mbuf can point to any part of direct 
> > > > > > > > > > mbuf by calling
> > > > > > > > > > rte_pktmbuf_attach_at().
> > > > > > > > > >
> > > > > > > > > > Possible use-cases could be:
> > > > > > > > > > - If a packet has multiple layers of encapsulation, 
> > > > > > > > > > multiple indirect
> > > > > > > > > >   buffers can reference different layers of the 
> > > > > > > > > > encapsulated packet.
> > > > > > > > > > - A large direct mbuf can even contain multiple packets in 
> > > > > > > > > > series and
> > > > > > > > > >   each packet can be referenced by multiple mbuf 
> > > > > > > > > > indirections.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Yongseok Koh 
> > > > > > > > >
> > > > > > > > > I think the current API is already able to do what you want.
> > > > > > > > >
> > > > > > > > > 1/ Here is a mbuf m with its data
> > > > > > > > >
> > > > > > > > >off
> > > > > > > > ><-->
> > > > > > > > >   len
> > > > > > > > >   ++   <-->
> > > > > > > > >   ||
> > > > > > > > > +-|v--+
> > > > > > > > > | |---|
> > > > > > > > > m   | buf  |XXX  ||
> > > > > > > > > |  ---|
> > > > > > > > > +-+
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2/ clone m:
> > > > > > > > >
> > > > > > > > >   c = rte_pktmbuf_alloc(pool);
> > > > > > > > >   rte_pktmbuf_attach(c, m);
> > > > > > > > >
> > > > > > > > >   Note that c has its own offset and length fields.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >off
> > > > > > > > ><-->
> > > > > > > > >   len
> > > > > > > > >   ++   <-->
> > > > > > > > >   ||
> > > > > > > > > +-|v--+
> > > > > > > > > | |---|
> > > > > > > > > m   | buf  |XXX  ||
> > > > > > > > > |  ---|
> > > > > > > > > +--^--+
> > > > > > > > >|
> > > > > > > > >   ++
> > > > > > > > > indirect  |
> > > > > > > > > +-|---+
> > > > > > > > > | |---|
> > > > > > > > > c   | buf  | ||
> > > > > > > > > |  ---|
> > > > > > > > > +-+
> > > > > > > > >
> > > > > > > > > offlen
> > > > > > > > > <--><-->
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 3/ remove some data from c without changing m
> > > > > > > > >
> > > > > > > > >rte_pktmbuf_adj(c, 10)   // at head
> > > > > > > > >rte_pktmbuf_trim(c, 10)  // at tail
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Please let me know if it fits your needs.
> > > > > > > >
> > > > > > > > No, it doesn't.
> > > > > > > >
> > > > > > > > Trimming head and tail with the current APIs removes data and 
> > > > > > > > make the space
> > > > > > > > available. Adjusting packet head means giving more headroom, 
> > > > > > > > not shifting the
> > > > > > > > buffer itself. If m has two indirect mbufs (c1 and c2) and 
> > > > > > > > those are pointing to
> > > > > > > > difference offsets in m,
> > > > > > > >
> > > > > > > > rte_pktmbuf_adj(c1, 10);
> > > > > > > > rte_pktmbuf_adj(c2, 20);
> > > > > > > >
> > > > > > > > then the owner of c2 regard the first (off+20)B as available 
> > > > > > > > headroom. If it
> > > > > > > > wants to attach outer header, it will overwrite the headroom 
> > > > > > > > even though the
> > > > > > > > owner of c1 is still accessing it. Instead, another mbuf (h1) 
> > > > > > > > for the outer
> > > > > > > > header should be linked by h1->next = c2.
> > > > > > >
> > > > > > > Yes, after these operations c1, c2 and m should become read-only. 
> > > > > > > So, to
> > > > > > > prepend headers, another mbuf has to be inserted before as you 
> > > > > > > suggest. It
> > > > > > > is possible to wrap this in a function rte_pktmbuf_clone_area(m, 
> > > > > > > offset

Re: [dpdk-dev] [PATCH] net/vmxnet3: change the SPDX tag style

2018-04-12 Thread Yong Wang
On 4/9/18, 2:00 AM, "dev on behalf of Hemant Agrawal"  wrote:

Cc: skh...@vmware.com

Signed-off-by: Hemant Agrawal 
---
Acked-by: Yong Wang 

 drivers/net/vmxnet3/base/upt1_defs.h| 7 ++-
 drivers/net/vmxnet3/base/vmxnet3_defs.h | 7 ++-
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/net/vmxnet3/base/upt1_defs.h 
b/drivers/net/vmxnet3/base/upt1_defs.h
index cf9141b..5fd7a39 100644
--- a/drivers/net/vmxnet3/base/upt1_defs.h
+++ b/drivers/net/vmxnet3/base/upt1_defs.h
@@ -1,9 +1,6 @@
-/*
+/* SPDX-License-Identifier: BSD-3-Clause
  * Copyright (C) 2007 VMware, Inc. All rights reserved.
- *
- * SPDX-License-Identifier:BSD-3-Clause
- *
- */
+ */
 
 /* upt1_defs.h
  *
diff --git a/drivers/net/vmxnet3/base/vmxnet3_defs.h 
b/drivers/net/vmxnet3/base/vmxnet3_defs.h
index a455e27..a30b8f2 100644
--- a/drivers/net/vmxnet3/base/vmxnet3_defs.h
+++ b/drivers/net/vmxnet3/base/vmxnet3_defs.h
@@ -1,9 +1,6 @@
-/*
+/* SPDX-License-Identifier: BSD-3-Clause
  * Copyright (C) 2007 VMware, Inc. All rights reserved.
- *
- * SPDX-License-Identifier:BSD-3-Clause
- *
- */
+ */
 
 /*
  * vmxnet3_defs.h --
-- 
2.7.4





Re: [dpdk-dev] [PATCH v5 02/21] eal: list acceptable init priorities

2018-04-12 Thread Gaëtan Rivet
Hello Neil,

On Thu, Apr 12, 2018 at 07:28:26AM -0400, Neil Horman wrote:
> On Wed, Apr 11, 2018 at 02:04:03AM +0200, Gaetan Rivet wrote:
> > Build a central list to quickly see each used priorities for
> > constructors, allowing to verify that they are both above 100 and in the
> > proper order.
> > 
> > Signed-off-by: Gaetan Rivet 
> > Acked-by: Neil Horman 
> > Acked-by: Shreyansh Jain 
> > ---
> >  lib/librte_eal/common/eal_common_log.c | 2 +-
> >  lib/librte_eal/common/include/rte_bus.h| 2 +-
> >  lib/librte_eal/common/include/rte_common.h | 8 +++-
> >  3 files changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/librte_eal/common/eal_common_log.c 
> > b/lib/librte_eal/common/eal_common_log.c
> > index a27192620..36b9d6e08 100644
> > --- a/lib/librte_eal/common/eal_common_log.c
> > +++ b/lib/librte_eal/common/eal_common_log.c
> > @@ -260,7 +260,7 @@ static const struct logtype logtype_strings[] = {
> >  };
> >  
> >  /* Logging should be first initializer (before drivers and bus) */
> > -RTE_INIT_PRIO(rte_log_init, 101);
> > +RTE_INIT_PRIO(rte_log_init, LOG);
> >  static void
> >  rte_log_init(void)
> >  {
> > diff --git a/lib/librte_eal/common/include/rte_bus.h 
> > b/lib/librte_eal/common/include/rte_bus.h
> > index 6fb08341a..eb9eded4e 100644
> > --- a/lib/librte_eal/common/include/rte_bus.h
> > +++ b/lib/librte_eal/common/include/rte_bus.h
> > @@ -325,7 +325,7 @@ enum rte_iova_mode rte_bus_get_iommu_class(void);
> >   * The constructor has higher priority than PMD constructors.
> >   */
> >  #define RTE_REGISTER_BUS(nm, bus) \
> > -RTE_INIT_PRIO(businitfn_ ##nm, 110); \
> > +RTE_INIT_PRIO(businitfn_ ##nm, BUS); \
> >  static void businitfn_ ##nm(void) \
> >  {\
> > (bus).name = RTE_STR(nm);\
> > diff --git a/lib/librte_eal/common/include/rte_common.h 
> > b/lib/librte_eal/common/include/rte_common.h
> > index 6c5bc5a76..8f04518f7 100644
> > --- a/lib/librte_eal/common/include/rte_common.h
> > +++ b/lib/librte_eal/common/include/rte_common.h
> > @@ -81,6 +81,12 @@ typedef uint16_t unaligned_uint16_t;
> >   */
> >  #define RTE_SET_USED(x) (void)(x)
> >  
> > +#define RTE_PRIORITY_LOG 101
> > +#define RTE_PRIORITY_BUS 110
> > +
> > +#define RTE_PRIO(prio) \
> > +   RTE_PRIORITY_ ## prio
> > +
> >  /**
> >   * Run function before main() with low priority.
> >   *
> > @@ -102,7 +108,7 @@ static void __attribute__((constructor, used)) 
> > func(void)
> >   *   Lowest number is the first to run.
> >   */
> >  #define RTE_INIT_PRIO(func, prio) \
> > -static void __attribute__((constructor(prio), used)) func(void)
> > +static void __attribute__((constructor(RTE_PRIO(prio)), used)) func(void)
> >  
> It just occured to me, that perhaps you should add a RTE_PRORITY_LAST 
> priority,
> and redefine RTE_INIT to RTE_INIT_PRIO(func, RTE_PRIORITY_LAST) for clarity.  
> I
> presume that constructors with no explicit priority run last, but the gcc
> manual doesn't explicitly say that.  It would be a heck of a bug to track down
> if somehow unprioritized constructors ran early.
> 
> Neil
> 

While certainly poorly documented, the behavior is well-defined. I don't see
a situation where the bug you describe could arise.

Adding RTE_PRIORITY_LAST is pretty harmless, but I'm not sure it's
justified to add it. If you still think it is useful, I will do it.

I'd be curious to hear if anyone has had issues of this kind.

-- 
Gaëtan Rivet
6WIND


[dpdk-dev] [dpdk-announce] release 18.05 delayed

2018-04-12 Thread Thomas Monjalon
Hi,

The integration deadline is passed by one week, and the first
release candidate is still far from being ready.
This time it is really, really late.
We will try to do this RC1 on the 20th of April, but no guarantee.

Then we may have a lot of new drivers or features to integrate
in the next release candidate. So we need to plan two weeks of work
before releasing the RC2 around the 4th of May, which was the date
initially targeted for the release 18.05.

Usually we need two more weeks of bug fixing in RC3 and RC4,
and few more days of validation before the release.
It means the release will happen on the 23rd of May in the "best case".

During this time, we need everybody's help to finish the reviews.

If you are not involved in reviews or last minute adjustments,
you can start working on new stuff for 18.08, because the preparation
period before the proposal deadline will be tight. We can think about
delaying the proposal deadline for 18.08 by one week (June, 8th).
We should also accept that the 18.08 release might be smaller than usual.

Thanks for understanding and helps




[dpdk-dev] [RFC 0/2] nfp driver fixes

2018-04-12 Thread Aaron Conole
Two fixes, one which is fairly obvious (1/2), the other which may
allow support of non-root users.  These patches are only compile tested
which is why they are submitted as RFC.  After a proper test, will
resubmit them as PATCH (with any suggested / recommended changes).

Aaron Conole (2):
  nfp: unlink the appropriate lock file
  nfp: allow for non-root user

 drivers/net/nfp/nfp_nfpu.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

-- 
2.14.3



[dpdk-dev] [RFC 2/2] nfp: allow for non-root user

2018-04-12 Thread Aaron Conole
Currently, the nfp lock files are taken from the global lock file
location, which will work when the user is running as root.  However,
some distributions and applications (notably ovs 2.8+ on RHEL/Fedora)
run as a non-root user.

Signed-off-by: Aaron Conole 
---
 drivers/net/nfp/nfp_nfpu.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/net/nfp/nfp_nfpu.c b/drivers/net/nfp/nfp_nfpu.c
index 2ed985ff4..ae2e07220 100644
--- a/drivers/net/nfp/nfp_nfpu.c
+++ b/drivers/net/nfp/nfp_nfpu.c
@@ -18,6 +18,22 @@
 #define NFP_CFG_EXP_BAR 7
 
 #define NFP_CFG_EXP_BAR_CFG_BASE   0x3
+#define NFP_LOCKFILE_PATH_FMT "%s/nfp%d"
+
+/* get nfp lock file path (/var/lock if root, $HOME otherwise) */
+static void
+nspu_get_lockfile_path(char *buffer, int bufsz, nfpu_desc_t *desc)
+{
+   const char *dir = "/var/lock";
+   const char *home_dir = getenv("HOME");
+
+   if (getuid() != 0 && home_dir != NULL)
+   dir = home_dir;
+
+   /* use current prefix as file path */
+   snprintf(buffer, bufsz, NFP_LOCKFILE_PATH_FMT, dir,
+   desc->nfp);
+}
 
 /* There could be other NFP userspace tools using the NSP interface.
  * Make sure there is no other process using it and locking the access for
@@ -30,9 +46,7 @@ nspv_aquire_process_lock(nfpu_desc_t *desc)
struct flock lock;
char lockname[30];
 
-   memset(&lock, 0, sizeof(lock));
-
-   snprintf(lockname, sizeof(lockname), "/var/lock/nfp%d", desc->nfp);
+   nspu_get_lockfile_path(lockname, sizeof(lockname), desc);
 
/* Using S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH */
desc->lock = open(lockname, O_RDWR | O_CREAT, 0666);
@@ -106,7 +120,6 @@ nfpu_close(nfpu_desc_t *desc)
rte_free(desc->nspu);
close(desc->lock);
 
-   snprintf(lockname, sizeof(lockname), "/var/lock/nfp%d", desc->nfp);
-   unlink(lockname);
+   nspu_get_lockfile_path(lockname, sizeof(lockname), desc);
return 0;
 }
-- 
2.14.3



[dpdk-dev] [RFC 1/2] nfp: unlink the appropriate lock file

2018-04-12 Thread Aaron Conole
The nfpu_close needs to unlink the lock file associated with the
nfp descriptor, not lock file 0.

Fixes: d12206e00590 ("net/nfp: add NSP user space interface")

Cc: sta...@dpdk.org
Signed-off-by: Aaron Conole 
---
 drivers/net/nfp/nfp_nfpu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/nfp/nfp_nfpu.c b/drivers/net/nfp/nfp_nfpu.c
index f11afef35..2ed985ff4 100644
--- a/drivers/net/nfp/nfp_nfpu.c
+++ b/drivers/net/nfp/nfp_nfpu.c
@@ -101,8 +101,12 @@ nfpu_open(struct rte_pci_device *pci_dev, nfpu_desc_t 
*desc, int nfp)
 int
 nfpu_close(nfpu_desc_t *desc)
 {
+   char lockname[30];
+
rte_free(desc->nspu);
close(desc->lock);
-   unlink("/var/lock/nfp0");
+
+   snprintf(lockname, sizeof(lockname), "/var/lock/nfp%d", desc->nfp);
+   unlink(lockname);
return 0;
 }
-- 
2.14.3



Re: [dpdk-dev] [PATCH v6 2/2] eal/vfio: export internal vfio functions

2018-04-12 Thread Thomas Monjalon
12/04/2018 08:23, Hemant Agrawal:
> This patch moves some of the internal vfio functions from
> eal_vfio.h to rte_vfio.h for common uses with "rte_" prefix.
> 
> This patch also change the FSLMC bus usages from the internal
> VFIO functions to external ones with "rte_" prefix
> 
> Signed-off-by: Hemant Agrawal 
> Acked-by: Anatoly Burakov 

Applied, thanks





Re: [dpdk-dev] [PATCH] vfio: fix device hotplug when several devices per group

2018-04-12 Thread Thomas Monjalon
10/04/2018 12:23, Anatoly Burakov:
> We only need to perform DMA mapping for first device in first group.
> At the time of mapping, we haven't yet added the device into the group,
> so the count is expected to be zero.
> 
> Fixes: 810bfa64c673 ("vfio: fix index for tracking devices in a group")
> Fixes: a9c349e3a100 ("vfio: fix device unplug when several devices per group")
> Fixes: 94c0776b1bad ("vfio: support hotplug")
> Cc: alejandro.luc...@netronome.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Anatoly Burakov 

Applied, thanks




Re: [dpdk-dev] [PATCH v6] vfio: change to use generic multi-process channel

2018-04-12 Thread Thomas Monjalon
05/04/2018 16:26, Tan, Jianfeng:
> Hi Anatoly,
> 
> An obvious action would be change rte_mp_request to 
> rte_mp_request_sync(). Before sending out the new patch, do you have any 
> other comments for this patch?
> 
> Hi Thomas,
> 
> Several patches will change vfio; may I know the your preferred apply 
> sequence? (I'm trying to find out which patch shall rebase on; of 
> course, I can wait until other patches are applied)
> 
> - http://dpdk.org/dev/patchwork/patch/37258/
> - http://dpdk.org/dev/patchwork/patch/37152/
> - http://dpdk.org/dev/patchwork/patch/37082/
> - http://dpdk.org/dev/patchwork/patch/37047/

All, but first one, are applied now.
I guess you can rebase on master.





Re: [dpdk-dev] [PATCH v2 0/5] allow procinfo and pdump on eth vdev

2018-04-12 Thread Thomas Monjalon
Hi Jinafeng,

05/04/2018 19:44, Jianfeng Tan:
> As we know, we have below limitations in vdev:
>   - dpdk-procinfo cannot get the stats of (most) vdev in primary process;
>   - dpdk-pdump cannot dump the packets for (most) vdev in primary proces;
>   - secondary process cannot use (most) vdev in primary process.
> 
> The very first reason is that the secondary process actually does not know
> the existence of those vdevs as vdevs are chained on a linked list, and
> not shareable to secondary.
> 
> In this patch series, we would like to propose a vdev sharing model like this:
>   - As a secondary process boots, all devices (including vdev) in primary
> will be automatically shared. After both primary and secondary process
> booted,
>   - Device add/remove in primary will be translated to device hog plug/unplug
> event in secondary processes. (TODO)
>   - Device add in secondary
> * If that kind of device support multi-process, the secondary will
>   request the primary to probe the device and the primary to share
>   it to the secondary. It's not necessary to have secondary-private
>   device in this case. (TODO)
> * If that kind of device does not support multi-process, the secondary
>   will probe the device by itself, and the port id is shared among
>   all primary/secondary processes.

Are you OK to consider this series for DPDK 18.08?




Re: [dpdk-dev] [PATCH] eal: fix compilation without VFIO

2018-04-12 Thread Thomas Monjalon
12/04/2018 16:13, Burakov, Anatoly:
> On 12-Apr-18 2:34 PM, Shahaf Shuler wrote:
> > a compilation error occurred when compiling with CONFIG_RTE_EAL_VFIO=n
> > 
> > == Build lib/librte_eal/linuxapp/eal
> >CC eal_vfio.o
> > /download/dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c:1535:1: error: no
> > previous prototype for 'rte_vfio_dma_map' [-Werror=missing-prototypes]
> >   rte_vfio_dma_map(uint64_t __rte_unused vaddr, __rte_unused uint64_t
> > iova,
> >   ^
> > /download/dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c:1542:1: error: no
> > previous prototype for 'rte_vfio_dma_unmap' [-Werror=missing-prototypes]
> >   rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused
> > iova,
> >   ^
> > 
> > As there is no use for those dummy functions without VFIO removing them
> > completely.
> 
> These functions are part of public API, like rest of functions in this 
> header. They're in the map file. Should we perhaps go the BSD way and 
> provide EAL with dummy prototypes as well? See bsdapp/eal/eal.c:763 onwards.

Why using dummy prototypes?
Because the prototypes in rte_vfio.h are under #ifdef VFIO_PRESENT ?
Is it possible to always define the prototypes in rte_vfio.h ?




Re: [dpdk-dev] [PATCH v3] net/ixgbe: Add access and locking APIs for MDIO

2018-04-12 Thread Zhang, Qi Z
Hi Choudaha:

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Shweta Choudaha
> Sent: Wednesday, April 11, 2018 10:00 PM
> To: dev@dpdk.org
> Cc: Lu, Wenzhuo ; Ananyev, Konstantin
> ; Zhang, Helin ; Yigit,
> Ferruh ; shweta.choud...@att.com
> Subject: [dpdk-dev] [PATCH v3] net/ixgbe: Add access and locking APIs for
> MDIO

Nitpick: title should not start with uppercase.

> +
> +EXPERIMENTAL {
> + global:
> +
> + rte_pmd_ixgbe_lock_mdio;
> + rte_pmd_ixgbe_unlock_mdio;

Can we rename to rte_pmd_ixgbe_mdio_lock and rte_pmd_ixgbe_mdio_unlock, so all 
mdio functions can be list closely when follow the alphabet sequence?


> + rte_pmd_ixgbe_mdio_read_unlocked;
> + rte_pmd_ixgbe_mdio_write_unlocked;

And this could be rte_pmd_ixgbe_mdio_unlocked_read/write to follow the same 
pattern that action after object

Regards
Qi

> +} DPDK_18.05;
> --
> 2.11.0



Re: [dpdk-dev] [PATCH 1/2] net/tap: add tun support

2018-04-12 Thread Varghese, Vipin
Hi Ophir,

Please find my answers inline to the queries.

> -Original Message-
> From: Ophir Munk [mailto:ophi...@mellanox.com]
> Sent: Thursday, April 12, 2018 5:19 PM
> To: Varghese, Vipin ; dev@dpdk.org;
> pascal.ma...@6wind.com; Yigit, Ferruh ; Thomas
> Monjalon ; Olga Shern ;
> Shahaf Shuler 
> Subject: RE: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> 
> Hi Vipin,
> This patch (adding TUN to TAP) has been Acked and accepted in next-net
> branch.
> I have some questions regarding the implementation (please find below).
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Vipin Varghese
> > Sent: Tuesday, April 03, 2018 12:38 AM
> > To: dev@dpdk.org; pascal.ma...@6wind.com; ferruh.yi...@intel.com
> > Cc: Vipin Varghese 
> > Subject: [dpdk-dev] [PATCH 1/2] net/tap: add tun support
> >
> > The change adds functional TUN PMD logic to the existing TAP PMD.
> > TUN PMD can be initialized with 'net_tunX' where 'X' represents unique id.
> > PMD supports argument interface, while MAC address and remote are not
> > supported.
> >
> 
> [...]
> 
> >
> > +   /*
> > +* TUN and TAP are created with IFF_NO_PI disabled.
> > +* For TUN PMD this mandatory as fields are used by
> > +* Kernel tun.c to determine whether its IP or non IP
> > +* packets.
> > +*
> > +* The logic fetches the first byte of data from mbuf.
> > +* compares whether its v4 or v6. If none matches default
> > +* value 0x00 is taken for protocol field.
> > +*/
> > +   char *buff_data = rte_pktmbuf_mtod(seg, void *);
> > +   j = (*buff_data & 0xf0);
> > +   if (j & (0x40 | 0x60))
> > +   pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
> > +
> 
> 1. Accessing the first byte here assumes it is the first IP header byte 
> (layer 3)
> which is correct for TUN.
> For TAP however the first byte belongs to Ethernet destination address
> (layer 2).
> Please explain how this logic will work for TAP.

Based on linux code base '/driver/net/tap.c' and '/driver/net/tun.c' from 3.13. 
to  4.16, 

Please find my observation below
1. File: tun.c, function: tun_get_user, check for 'tun->flags & TUN_TYPE_MASK' 
is done and if non ip is taken counter 'rx_dropped' is updated.
2. File: tap.c, there are no checks for 'tap->flags' for IFF_NO_PI in rx data 
path. Counter 'rx_dropped' is updated in 'tap_handle_frame'. 

Please find my reasoning below
1. First approach was to have separate function for tap and tun TX and RX. But 
this will introduce code duplication, hence reworked the code as above.
2. During my internal testing assigning dummy value for protocol field in TAP 
packets, did not show a difference in behaviour. May be there are some specific 
cases this failing. 

If there difference in behaviour, can please share the same?

> 
> 2. If the first TUN byte contains 0x2X (which is neither IPv4 nor IPv6) it 
> will
> end up by setting ip.proto as 0xdd86.
> Please explain how this logic will work for non-IP packets in TUN

I see your point. You are correct about this. Thanks for pointing out, may I 
send correction for this as

"""
-   if (j & (0x40 | 0x60))
-   pi.proto = (j == 0x40) ? 0x0008 : 0xdd86;
+   pi.proto = (j == 0x40) ? 0x0008 : 
+   (j == 0x60) ? 0xdd86 :
+   0x00;
"""


  1   2   >