[dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK

2014-06-03 Thread Anatoly Burakov
This patchset adds support for using VFIO instead of IGB_UIO to
map the device BARs.

VFIO is a kernel 3.6+ driver allowing secure DMA from userspace
by means of using IOMMU instead of working directly with physical
memory like igb_uio does.

Short summary:
* Adding support for VFIO in EAL PCI code
* Adding new command-line parameter for VFIO interrupt type
* Adding support for VFIO in setup.sh
* Renaming igb_uio_bind to dpdk_nic_bind and adding support for
  VFIO there
* Removing PCI ID list from igb_uio, effectively making it another
  generic PCI driver similar to pci_stub, vfio-pci et al
* Adding autotest for VFIO interrupt types
* Making igb_uio and VFIO compilation optional

v2 fixes:
* Fixed a couple of resource leaks

v3 fixes:
* Fixed various checkpatch.pl issues
* Added MSI interrupt support
* Added an option to automatically determine interrupt type
* Fixed various issues of commit atomicity

v4 fixes:
* Rebased on top of 5ebbb17281645b23359fbd49133bb639b63ba88c
* Fixed a typo in EAL command-line help text

Anatoly Burakov (20):
  pci: move open() out of pci_map_resource, rename structs
  pci: move uio mapping code to a separate file
  pci: fixing errors in a previous commit found by checkpatch
  pci: distinguish between legitimate failures and non-fatal errors
  pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  igb_uio: make igb_uio compilation optional
  igb_uio: Moved interrupt type out of igb_uio
  vfio: add support for VFIO in Linuxapp targets
  vfio: add VFIO header
  interrupts: Add support for VFIO interrupts
  eal: remove -Wno-return-type for non-existent eal_hpet.c
  vfio: create mapping code for VFIO
  vfio: add multiprocess support.
  pci: enable VFIO device binding
  eal: added support for selecting VFIO interrupt type from EAL
command-line
  eal: make --no-huge use mmap instead of malloc
  test app: adding unit tests for VFIO EAL command-line parameter
  igb_uio: Removed PCI ID table from igb_uio
  binding script: Renamed igb_uio_bind to dpdk_nic_bind
  setup script: adding support for VFIO to setup.sh

 app/test/test_eal_flags.c  |  36 +
 app/test/test_pci.c|   4 +-
 config/common_linuxapp |   2 +
 lib/librte_eal/bsdapp/eal/eal_pci.c|   2 +-
 lib/librte_eal/common/Makefile |   1 +
 lib/librte_eal/common/eal_common_pci.c |  16 +-
 lib/librte_eal/common/include/rte_pci.h|   5 +-
 .../common/include/rte_pci_dev_feature_defs.h  |  46 ++
 .../common/include/rte_pci_dev_features.h  |  44 ++
 lib/librte_eal/linuxapp/Makefile   |   2 +
 lib/librte_eal/linuxapp/eal/Makefile   |   5 +-
 lib/librte_eal/linuxapp/eal/eal.c  |  36 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 285 +++-
 lib/librte_eal/linuxapp/eal/eal_memory.c   |   8 +-
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 473 ++---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 403 +++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 781 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +++
 .../linuxapp/eal/include/eal_internal_cfg.h|   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 116 +++
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h |  55 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c  |  69 +-
 lib/librte_pmd_e1000/em_ethdev.c   |   2 +-
 lib/librte_pmd_e1000/igb_ethdev.c  |   4 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c|   4 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c|   2 +-
 tools/{igb_uio_bind.py => dpdk_nic_bind.py}| 157 +++--
 tools/setup.sh | 172 -
 29 files changed, 2545 insertions(+), 587 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (83%)

-- 
1.8.1.4



[dpdk-dev] [PATCH v4 04/20] pci: distinguish between legitimate failures and non-fatal errors

2014-06-03 Thread Anatoly Burakov
Currently, EAL does not distinguish between actual failures and expected
initialization errors. E.g. sometimes the driver fails to initialize
because it was not supposed to be initialized in the first place, such
as device not being managed by said driver.

This patch makes EAL fail on actual initialization errors while still
skipping over expected initialization errors.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_pci.c| 16 +---
 lib/librte_eal/linuxapp/eal/eal_pci.c |  7 ---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  4 ++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 7c23e86..1fb8f2c 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -101,8 +101,8 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)

 /*
  * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if no driver is
- * found for this device.
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
  * For drivers with the RTE_PCI_DRV_MULTIPLE flag enabled, register
  * the same device multiple times until failure to do so.
  * It is required for non-Intel NIC drivers provided by third-parties such
@@ -118,7 +118,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
rc = rte_eal_pci_probe_one_driver(dr, dev);
if (rc < 0)
/* negative value is an error */
-   break;
+   return -1;
if (rc > 0)
/* positive value means driver not found */
continue;
@@ -130,7 +130,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
;
return 0;
}
-   return -1;
+   return 1;
 }

 /*
@@ -144,6 +144,7 @@ rte_eal_pci_probe(void)
struct rte_pci_device *dev = NULL;
struct rte_devargs *devargs;
int probe_all = 0;
+   int ret = 0;

if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
probe_all = 1;
@@ -157,10 +158,11 @@ rte_eal_pci_probe(void)

/* probe all or only whitelisted devices */
if (probe_all)
-   pci_probe_all_drivers(dev);
+   ret = pci_probe_all_drivers(dev);
else if (devargs != NULL &&
-   devargs->type == RTE_DEVTYPE_WHITELISTED_PCI &&
-   pci_probe_all_drivers(dev) < 0)
+   devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
+   ret = pci_probe_all_drivers(dev);
+   if (ret < 0)
rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 " cannot be used\n", dev->addr.domain, 
dev->addr.bus,
 dev->addr.devid, dev->addr.function);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 628813b..0b779ec 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -401,6 +401,7 @@ int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device 
*dev)
 {
struct rte_pci_id *id_table;
+   int ret = 0;

for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {

@@ -431,13 +432,13 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
if (dev->devargs != NULL &&
dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI) {
RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not 
initializing\n");
-   return 0;
+   return 1;
}

if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
/* map resources for devices that use igb_uio */
-   if (pci_uio_map_resource(dev) < 0)
-   return -1;
+   if ((ret = pci_uio_map_resource(dev)) != 0)
+   return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
   rte_eal_process_type() == RTE_PROC_PRIMARY) {
/* unbind current driver */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index ae4e716..426769b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -137,7 +137,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev) {
}

RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-   return -1;
+   return 1;
 }

 static int
@@ -284,7 +284,7 @@

[dpdk-dev] [PATCH v4 06/20] igb_uio: make igb_uio compilation optional

2014-06-03 Thread Anatoly Burakov
Currently, igb_uio is always compiled. Some Linux distributions may not
want to include igb_uio with DPDK, so we need to make sure that igb_uio
compilation for Linuxapp targets can be optional.

Signed-off-by: Anatoly Burakov 
---
 config/common_linuxapp   | 1 +
 lib/librte_eal/linuxapp/Makefile | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 62619c6..b17e37e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y

 #
 # Compile Environment Abstraction Layer for linux
diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index b00e89f..acbf500 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -31,7 +31,9 @@

 include $(RTE_SDK)/mk/rte.vars.mk

+ifeq ($(CONFIG_RTE_EAL_IGB_UIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += igb_uio
+endif
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
-- 
1.8.1.4



[dpdk-dev] [PATCH v4 07/20] igb_uio: Moved interrupt type out of igb_uio

2014-06-03 Thread Anatoly Burakov
Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/Makefile |  1 +
 lib/librte_eal/common/include/rte_pci.h|  1 +
 .../common/include/rte_pci_dev_feature_defs.h  | 46 +
 .../common/include/rte_pci_dev_features.h  | 44 
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c  | 48 +-
 5 files changed, 112 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 0016fc5..e2a3f3a 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -39,6 +39,7 @@ INC += rte_rwlock.h rte_spinlock.h rte_tailq.h 
rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_cpuflags.h rte_version.h rte_tailq_elem.h
 INC += rte_eal_memconfig.h rte_malloc_heap.h
 INC += rte_hexdump.h rte_devargs.h rte_dev.h
+INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h

 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 11b8c13..e653027 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -80,6 +80,7 @@ extern "C" {
 #include 
 #include 
 #include 
+
 #include 

 TAILQ_HEAD(pci_device_list, rte_pci_device); /**< PCI devices in D-linked Q. */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h 
b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
new file mode 100644
index 000..82f2c00
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
@@ -0,0 +1,46 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_DEFS_H_
+#define _RTE_PCI_DEV_DEFS_H_
+
+/* interrupt mode */
+enum rte_intr_mode {
+   RTE_INTR_MODE_NONE = 0,
+   RTE_INTR_MODE_LEGACY,
+   RTE_INTR_MODE_MSI,
+   RTE_INTR_MODE_MSIX,
+   RTE_INTR_MODE_MAX
+};
+
+#endif /* _RTE_PCI_DEV_DEFS_H_ */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_features.h 
b/lib/librte_eal/common/include/rte_pci_dev_features.h
new file mode 100644
index 000..01200de
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_features.h
@@ -0,0 +1,44 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Inte

[dpdk-dev] [PATCH v4 08/20] vfio: add support for VFIO in Linuxapp targets

2014-06-03 Thread Anatoly Burakov
Add VFIO compilation option to common Linuxapp config.

Signed-off-by: Anatoly Burakov 
---
 config/common_linuxapp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index b17e37e..2ed4b7e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y

 #
 # Compile Environment Abstraction Layer for linux
-- 
1.8.1.4



[dpdk-dev] [PATCH v4 09/20] vfio: add VFIO header

2014-06-03 Thread Anatoly Burakov
Adding a header that will determine if VFIO support should be compiled
in. If VFIO is enabled in config (and it's enabled by default), then the
header will also check for kernel version. If VFIO is enabled in config
and if the kernel version is 3.6+, then VFIO_PRESENT will be defined.
This is the macro that should be used to determine if VFIO support is
being compiled in.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h | 49 ++
 1 file changed, 49 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h

diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
new file mode 100644
index 000..354e9ca
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_VFIO_H_
+#define EAL_VFIO_H_
+
+/*
+ * determine if VFIO is present on the system
+ */
+#ifdef RTE_EAL_VFIO
+#include 
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
+#include 
+
+#define VFIO_PRESENT
+#endif /* kernel version */
+#endif /* RTE_EAL_VFIO */
+
+#endif /* EAL_VFIO_H_ */
-- 
1.8.1.4



[dpdk-dev] [PATCH v4 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c

2014-06-03 Thread Anatoly Burakov
eal_hpet.c was renamed to eal_timer.c and, thanks to code changes, does
not need the -Wno-return-type any more.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index d958014..5f3be5f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -93,7 +93,6 @@ CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
-CFLAGS_eal_hpet.o += -Wno-return-type
 endif

 INC := rte_per_lcore.h rte_lcore.h rte_interrupts.h rte_kni_common.h 
rte_dom0_common.h
-- 
1.8.1.4



[dpdk-dev] [PATCH v4 14/20] pci: enable VFIO device binding

2014-06-03 Thread Anatoly Burakov
Add support for binding VFIO devices if RTE_PCI_DRV_NEED_MAPPING is set
for this driver. Try VFIO first, if not mapped then try IGB_UIO too.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 42 ---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index a0abec8..8a9cbf9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -393,6 +393,27 @@ error:
return -1;
 }

+static int
+pci_map_device(struct rte_pci_device *dev)
+{
+   int ret, mapped = 0;
+
+   /* try mapping the NIC resources using VFIO if it exists */
+#ifdef VFIO_PRESENT
+   if (pci_vfio_is_enabled()) {
+   if ((ret = pci_vfio_map_resource(dev)) == 0)
+   mapped = 1;
+   else if (ret < 0)
+   return ret;
+   }
+#endif
+   /* map resources for devices that use igb_uio */
+   if (!mapped && (ret = pci_uio_map_resource(dev)) != 0)
+   return ret;
+
+   return 0;
+}
+
 /*
  * If vendor/device ID match, call the devinit() function of the
  * driver.
@@ -400,8 +421,8 @@ error:
 int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device 
*dev)
 {
+   int ret;
struct rte_pci_id *id_table;
-   int ret = 0;

for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {

@@ -436,8 +457,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
}

if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
-   /* map resources for devices that use igb_uio */
-   if ((ret = pci_uio_map_resource(dev)) != 0)
+   if ((ret = pci_map_device(dev)) != 0)
return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
   rte_eal_process_type() == RTE_PROC_PRIMARY) {
@@ -473,5 +493,21 @@ rte_eal_pci_init(void)
RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
return -1;
}
+#ifdef VFIO_PRESENT
+   pci_vfio_enable();
+
+   if (pci_vfio_is_enabled()) {
+
+   /* if we are primary process, create a thread to communicate 
with
+* secondary processes. the thread will use a socket to wait for
+* requests from secondary process to send open file 
descriptors,
+* because VFIO does not allow multiple open descriptors on a 
group or
+* VFIO container.
+*/
+   if (internal_config.process_type == RTE_PROC_PRIMARY &&
+   pci_vfio_mp_sync_setup() < 0)
+   return -1;
+   }
+#endif
return 0;
 }
-- 
1.8.1.4



[dpdk-dev] [PATCH v4 02/20] pci: move uio mapping code to a separate file

2014-06-03 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 403 +
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 403 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  66 
 4 files changed, 474 insertions(+), 399 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index b052820..d958014 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -57,6 +57,7 @@ endif
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index fd88bd0..628813b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -32,8 +32,6 @@
  */

 #include 
-#include 
-#include 
 #include 
 #include 

@@ -47,6 +45,7 @@
 #include "rte_pci_dev_ids.h"
 #include "eal_filesystem.h"
 #include "eal_private.h"
+#include "eal_pci_init.h"

 /**
  * @file
@@ -57,30 +56,7 @@
  * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */

-struct pci_map {
-   void *addr;
-   uint64_t offset;
-   uint64_t size;
-   uint64_t phaddr;
-};
-
-/*
- * For multi-process we need to reproduce all PCI mappings in secondary
- * processes, so save them in a tailq.
- */
-struct mapped_pci_resource {
-   TAILQ_ENTRY(mapped_pci_resource) next;
-
-   struct rte_pci_addr pci_addr;
-   char path[PATH_MAX];
-   int nb_maps;
-   struct pci_map maps[PCI_MAX_RESOURCE];
-};
-
-TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
-static struct mapped_pci_res_list *pci_res_list;
-
-static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+struct mapped_pci_res_list *pci_res_list = NULL;

 /* unbind kernel driver for this device */
 static int
@@ -122,8 +98,8 @@ error:
 }

 /* map a particular resource from a file */
-static void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
+void *
+pci_map_resource(void * requested_addr, int fd, off_t offset, size_t size)
 {
void *mapaddr;

@@ -147,342 +123,6 @@ fail:
return NULL;
 }

-#define OFF_MAX  ((uint64_t)(off_t)-1)
-static int
-pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
-{
-   int i;
-   char dirname[PATH_MAX];
-   char filename[PATH_MAX];
-   uint64_t offset, size;
-
-   for (i = 0; i != nb_maps; i++) {
- 
-   /* check if map directory exists */
-   rte_snprintf(dirname, sizeof(dirname), 
-   "%s/maps/map%u", devname, i);
- 
-   if (access(dirname, F_OK) != 0)
-   break;
- 
-   /* get mapping offset */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/offset", dirname);
-   if (pci_parse_sysfs_value(filename, &offset) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse offset of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
- 
-   /* get mapping size */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/size", dirname);
-   if (pci_parse_sysfs_value(filename, &size) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse size of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
- 
-   /* get mapping physical address */
-   rte_snprintf(filename, sizeof(filename),
-   "%s/addr", dirname);
-   if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
-   RTE_LOG(ERR, EAL,
-   "%s(): cannot parse addr of %s\n",
-   __func__, dirname);
-   return (-1);
-   }
-
-   if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
-   RTE_LOG(ERR, EAL,
-   "%s(): offset/size exceed system max value\n",
-   __func__); 
-   return (-1);
-   }
-
-   maps[i].offset = offset;
-   maps[i].size = size;
-}
-   return (i);
-}
-
-static 

[dpdk-dev] [PATCH v4 17/20] test app: adding unit tests for VFIO EAL command-line parameter

2014-06-03 Thread Anatoly Burakov
Adding unit tests for VFIO interrupt type command-line parameter. We
don't know if VFIO is compiled (eal_vfio.h header is internal to
Linuxapp EAL), so we check this flag regardless.

Signed-off-by: Anatoly Burakov 
---
 app/test/test_eal_flags.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 195a1f5..a0ee4e6 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -768,6 +768,22 @@ test_misc_flags(void)
const char *argv11[] = {prgname, "--file-prefix=virtaddr",
"-c", "1", "-n", "2", "--base-virtaddr=0x12345678"};

+   /* try running with --vfio-intr INTx flag */
+   const char *argv12[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=legacy"};
+
+   /* try running with --vfio-intr MSI flag */
+   const char *argv13[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=msi"};
+
+   /* try running with --vfio-intr MSI-X flag */
+   const char *argv14[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=msix"};
+
+   /* try running with --vfio-intr invalid flag */
+   const char *argv15[] = {prgname, "--file-prefix=intr",
+   "-c", "1", "-n", "2", "--vfio-intr=invalid"};
+

if (launch_proc(argv0) == 0) {
printf("Error - process ran ok with invalid flag\n");
@@ -820,6 +836,26 @@ test_misc_flags(void)
printf("Error - process did not run ok with --base-virtaddr 
parameter\n");
return -1;
}
+   if (launch_proc(argv12) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr INTx parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv13) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr MSI parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv14) != 0) {
+   printf("Error - process did not run ok with "
+   "--vfio-intr MSI-X parameter\n");
+   return -1;
+   }
+   if (launch_proc(argv15) == 0) {
+   printf("Error - process run ok with "
+   "--vfio-intr invalid parameter\n");
+   return -1;
+   }
return 0;
 }
 #endif
-- 
1.8.1.4



[dpdk-dev] [PATCH v4 16/20] eal: make --no-huge use mmap instead of malloc

2014-06-03 Thread Anatoly Burakov
This makes it possible to run DPDK without hugepage memory when VFIO
is used, as VFIO uses virtual addresses to set up DMA mappings.

Technically, malloc is just fine, but we want to guarantee that
memory will be page-aligned, so using mmap to be safe.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 8d1edd9..315214b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1031,7 +1031,13 @@ rte_eal_hugepage_init(void)

/* hugetlbfs can be disabled */
if (internal_config.no_hugetlbfs) {
-   addr = malloc(internal_config.memory);
+   addr = mmap(NULL, internal_config.memory, PROT_READ | 
PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+   if (addr == MAP_FAILED) {
+   RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
+   strerror(errno));
+   return -1;
+   }
mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
mcfg->memseg[0].addr = addr;
mcfg->memseg[0].len = internal_config.memory;
-- 
1.8.1.4



[dpdk-dev] [PATCH v4 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind

2014-06-03 Thread Anatoly Burakov
Renaming the igb_uio_bind script to dpdk_nic_bind to have a generic name
since we're now supporting two drivers.

Signed-off-by: Anatoly Burakov 
---
 tools/{igb_uio_bind.py => dpdk_nic_bind.py} | 47 -
 tools/setup.sh  | 16 +-
 2 files changed, 40 insertions(+), 23 deletions(-)
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (92%)

diff --git a/tools/igb_uio_bind.py b/tools/dpdk_nic_bind.py
similarity index 92%
rename from tools/igb_uio_bind.py
rename to tools/dpdk_nic_bind.py
index 33adcf4..1e517e7 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -42,6 +42,8 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
+# list of supported DPDK drivers
+dpdk_drivers = [ "igb_uio", "vfio-pci" ]

 def usage():
 '''Print usage information for the program'''
@@ -146,22 +148,33 @@ def find_module(mod):

 def check_modules():
 '''Checks that igb_uio is loaded'''
+global dpdk_drivers

 fd = file("/proc/modules")
 loaded_mods = fd.readlines()
 fd.close()
-mod = "igb_uio"
+
+# list of supported modules
+mods =  [{"Name" : driver, "Found" : False} for driver in dpdk_drivers]

 # first check if module is loaded
-found = False
 for line in loaded_mods:
-if line.startswith(mod):
-found = True
-break
-if not found:
-print "Error - module %s not loaded" %mod
+for mod in mods:
+if line.startswith(mod["Name"]):
+mod["Found"] = True
+# special case for vfio_pci (module is named vfio-pci,
+# but its .ko is named vfio_pci)
+elif line.replace("_", "-").startswith(mod["Name"]):
+mod["Found"] = True
+
+# check if we have at least one loaded module
+if True not in [mod["Found"] for mod in mods]:
+print "Error - no supported modules are loaded"
 sys.exit(1)

+# change DPDK driver list to only contain drivers that are loaded
+dpdk_drivers = [mod["Name"] for mod in mods if mod["Found"]]
+
 def has_driver(dev_id):
 '''return true if a device is assigned to a driver. False otherwise'''
 return "Driver_str" in devices[dev_id]
@@ -196,6 +209,7 @@ def get_nic_details():
 the pci addresses (domain:bus:slot.func). The values are themselves
 dictionaries - one for each NIC.'''
 global devices
+global dpdk_drivers

 # clear any old data
 devices = {} 
@@ -240,10 +254,11 @@ def get_nic_details():

 # add igb_uio to list of supporting modules if needed
 if "Module_str" in devices[d]:
-if "igb_uio" not in devices[d]["Module_str"]:
-devices[d]["Module_str"] = devices[d]["Module_str"] + 
",igb_uio"
+for driver in dpdk_drivers:
+if driver not in devices[d]["Module_str"]:
+devices[d]["Module_str"] = devices[d]["Module_str"] + 
",%s" % driver
 else:
-devices[d]["Module_str"] = "igb_uio"
+devices[d]["Module_str"] = ",".join(dpdk_drivers)

 # make sure the driver and module strings do not have any duplicates
 if has_driver(d):
@@ -320,7 +335,7 @@ def bind_one(dev_id, driver, force):
 dev["Driver_str"] = "" # clear driver string

 # if we are binding to one of DPDK drivers, add PCI id's to that driver
-if driver == "igb_uio":
+if driver in dpdk_drivers:
 filename = "/sys/bus/pci/drivers/%s/new_id" % driver
 try:
 f = open(filename, "w")
@@ -397,21 +412,23 @@ def show_status():
 '''Function called when the script is passed the "--status" option. 
Displays
 to the user what devices are bound to the igb_uio driver, the kernel driver
 or to no driver'''
+global dpdk_drivers
 kernel_drv = []
-uio_drv = []
+dpdk_drv = []
 no_drv = []
+
 # split our list of devices into the three categories above
 for d in devices.keys():
 if not has_driver(d):
 no_drv.append(devices[d])
 continue
-if devices[d]["Driver_str"] == "igb_uio":
-uio_drv.append(devices[d])
+if devices[d]["Driver_str"] in dpdk_drivers:
+dpdk_drv.append(devices[d])
 else:
 kernel_drv.append(devices[d])

 # print each category separately, so we can clearly see what's used by DPDK
-display_devices("Network devices using IGB_UIO driver", uio_drv, \
+display_devices("Network devices using DPDK-compatible driver", dpdk_drv, \
 "drv=%(Driver_str)s unused=%(Module_str)s")
 display_devices("Network devices using kernel driver", kernel_drv,
 "if=%(Interface)s drv=%(Driver_str)s unused=%(Module_str)s 
%(Active)s")
diff --git a/tools/setup.sh b/tools/setup.sh
index 39be8fc..e0671b8 100755
-

[dpdk-dev] [PATCH v4 12/20] vfio: create mapping code for VFIO

2014-06-03 Thread Anatoly Burakov
Adding code to support VFIO mapping (primary processes only). Most of
the things are done via ioctl() calls on either /dev/vfio/vfio (the
container) or a /dev/vfio/$GROUP_NR (IOMMU group).

In a nutshell, the code does the following:
1. creates a VFIO container (an entity that allows sharing IOMMU DMA
   mappings between devices)
2. checks if a given PCI device is a member of an IOMMU group (if it's
   not, this indicates that the device isn't bound to VFIO)
3. calls open() the group file to obtain a group fd
4. checks if the group is viable (that is, if all the devices in the
   same IOMMU group are either bound to VFIO or not bound to anything)
5. adds the group to a container
6. sets up DMA mappings (only done once, mapping whole DPDK hugepage
   memory for DMA, with a 1:1 correspondence of IOVA to PA)
7. gets the actual PCI device fd from the group fd (can fail, which
   simply means that this particular device is not bound to VFIO)
8. maps BARs (MSI-X BAR cannot be mmaped, so skipping it)
9. sets up interrupt structures (but not enables them!)
10. enables PCI bus mastering

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile   |   2 +
 lib/librte_eal/linuxapp/eal/eal.c  |   2 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 706 +
 .../linuxapp/eal/include/eal_internal_cfg.h|   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  31 +
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h |   6 +
 6 files changed, 750 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 5f3be5f..cf9f026 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -58,6 +58,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
@@ -87,6 +88,7 @@ CFLAGS_eal_log.o := -D_GNU_SOURCE
 CFLAGS_eal_common_log.o := -D_GNU_SOURCE
 CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE
 CFLAGS_eal_pci.o := -D_GNU_SOURCE
+CFLAGS_eal_pci_vfio.o := -D_GNU_SOURCE
 CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE

 # workaround for a gcc bug with noreturn attribute
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 9d2675b..aeb5903 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -650,6 +650,8 @@ eal_parse_args(int argc, char **argv)
internal_config.force_sockets = 0;
internal_config.syslog_facility = LOG_DAEMON;
internal_config.xen_dom0_support = 0;
+   /* if set to NONE, interrupt mode is determined automatically */
+   internal_config.vfio_intr_mode = RTE_INTR_MODE_NONE;
 #ifdef RTE_LIBEAL_USE_HPET
internal_config.no_hpet = 0;
 #else
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
new file mode 100644
index 000..e1d6973
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -0,0 +1,706 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INC

[dpdk-dev] [PATCH v4 03/20] pci: fixing errors in a previous commit found by checkpatch

2014-06-03 Thread Anatoly Burakov

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 61f09cc..ae4e716 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -69,7 +69,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map 
maps[], int nb_maps) {
if (pci_parse_sysfs_value(filename, &offset) < 0) {
RTE_LOG(ERR, EAL,
"%s(): cannot parse offset of %s\n", 
__func__, dirname);
-   return (-1);
+   return -1;
}

/* get mapping size */
@@ -77,7 +77,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map 
maps[], int nb_maps) {
if (pci_parse_sysfs_value(filename, &size) < 0) {
RTE_LOG(ERR, EAL,
"%s(): cannot parse size of %s\n", 
__func__, dirname);
-   return (-1);
+   return -1;
}

/* get mapping physical address */
@@ -85,20 +85,20 @@ pci_uio_get_mappings(const char *devname, struct pci_map 
maps[], int nb_maps) {
if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
RTE_LOG(ERR, EAL,
"%s(): cannot parse addr of %s\n", 
__func__, dirname);
-   return (-1);
+   return -1;
}

if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
RTE_LOG(ERR, EAL,
"%s(): offset/size exceed system max 
value\n", __func__);
-   return (-1);
+   return -1;
}

maps[i].offset = offset;
maps[i].size = size;
}

-   return (i);
+   return i;
 }

 static int
@@ -128,12 +128,12 @@ pci_uio_map_secondary(struct rte_pci_device *dev) {
(size_t) uio_res->maps[i].size) != 
uio_res->maps[i].addr) {
RTE_LOG(ERR, EAL, "Cannot mmap device 
resource\n");
close(fd);
-   return (-1);
+   return -1;
}
/* fd is not needed in slave process, close it */
close(fd);
}
-   return (0);
+   return 0;
}

RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
@@ -277,7 +277,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {

/* secondary processes - use already recorded details */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-   return (pci_uio_map_secondary(dev));
+   return pci_uio_map_secondary(dev);

/* find uio resource */
uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
@@ -299,7 +299,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
/* allocate the mapping details for secondary processes*/
if ((uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0)) == NULL) {
RTE_LOG(ERR, EAL, "%s(): cannot store uio mmap details\n", 
__func__);
-   return (-1);
+   return -1;
}

rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
@@ -310,7 +310,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
RTE_DIM(uio_res->maps));
if (nb_maps < 0) {
rte_free(uio_res);
-   return (nb_maps);
+   return nb_maps;
}

uio_res->nb_maps = nb_maps;
-- 
1.8.1.4



[dpdk-dev] [PATCH v4 13/20] vfio: add multiprocess support.

2014-06-03 Thread Anatoly Burakov
Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.

For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.

VFIO multiprocess sync communicates over a simple protocol. It defines
two requests - request for group fd, and request for container fd.
Possible replies are: SOCKET_OK (an OK signal), SOCKET_ERR (error
signal) and SOCKET_NO_FD (a signal that indicates that the requested
VFIO group is valid, but no fd is present for that group - indicating
that the respective group is simply not bound to VFIO driver).

Here is the logic in a nutshell:

1. secondary process sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
1a. in case of SOCKET_REQ_GROUP, client also then sends group number
2. primary process receives message
2a. in case of invalid group, SOCKET_ERR is sent back to secondary
2b. in case of unbound group, SOCKET_NO_FD is sent back to secondary
2c. in case of valid group, SOCKET_OK is sent and followed by fd
3. socket is closed

in case of any error, socket is closed and SOCKET_ERR is sent.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |  79 -
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  19 +
 4 files changed, 492 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index cf9f026..3c05edf 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index e1d6973..f0d4f55 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -303,7 +303,7 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)
 }

 /* open container fd or get an existing one */
-static int
+int
 pci_vfio_get_container_fd(void)
 {
int ret, vfio_container_fd;
@@ -333,13 +333,36 @@ pci_vfio_get_container_fd(void)
}

return vfio_container_fd;
+   } else {
+   /*
+* if we're in a secondary process, request container fd from 
the
+* primary process via our socket
+*/
+   int socket_fd;
+   if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+   RTE_LOG(ERR, EAL, "  cannot connect to primary 
process!\n");
+   return -1;
+   }
+   if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_CONTAINER) 
< 0) {
+   RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+   close(socket_fd);
+   return -1;
+   }
+   vfio_container_fd = vfio_mp_sync_receive_fd(socket_fd);
+   if (vfio_container_fd < 0) {
+   RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+   close(socket_fd);
+   return -1;
+   }
+   close(socket_fd);
+   return vfio_container_fd;
}

return -1;
 }

 /* open group fd or get an existing one */
-static int
+int
 pci_vfio_get_group_fd(int iommu_group_no)
 {
int i;
@@ -375,6 +398,44 @@ pci_vfio_get_group_fd(int iommu_group_no)
vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = 
vfio_group_fd;
return vfio_group_fd;
}
+   /* if we're in a secondary process, request group fd from the primary
+* process via our socket
+*/
+   else {
+   int socket_fd, ret;
+   if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+   RTE_LOG(ERR, EAL, "  cannot connect to primary 
process!\n");
+   return -1;
+   }
+   if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_GROUP) < 0) 
{
+  

[dpdk-dev] [PATCH v4 18/20] igb_uio: Removed PCI ID table from igb_uio

2014-06-03 Thread Anatoly Burakov
Removing PCI ID list to make igb_uio more similar to a generic driver
like vfio-pci or pci_uio_generic. This is done to make it easier for
the binding script to support multiple drivers.

Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.

Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This is reflected
in changes to PCI binding script as well.

There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |  21 +-
 tools/igb_uio_bind.py | 118 +++---
 2 files changed, 59 insertions(+), 80 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 7d5e6b4..6362b1c 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -65,25 +65,6 @@ struct rte_uio_pci_dev {
 static char *intr_mode = NULL;
 static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;

-/* PCI device id table */
-static struct pci_device_id igbuio_pci_ids[] = {
-#define RTE_PCI_DEV_ID_DECL_EM(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGB(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGBVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBE(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBEVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#ifdef RTE_LIBRTE_VIRTIO_PMD
-#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#ifdef RTE_LIBRTE_VMXNET3_PMD
-#define RTE_PCI_DEV_ID_DECL_VMXNET3(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#include 
-{ 0, },
-};
-
-MODULE_DEVICE_TABLE(pci, igbuio_pci_ids);
-
 static inline struct rte_uio_pci_dev *
 igbuio_get_uio_pci_dev(struct uio_info *info)
 {
@@ -619,7 +600,7 @@ igbuio_config_intr_mode(char *intr_str)

 static struct pci_driver igbuio_pci_driver = {
.name = "igb_uio",
-   .id_table = igbuio_pci_ids,
+   .id_table = NULL,
.probe = igbuio_pci_probe,
.remove = igbuio_pci_remove,
 };
diff --git a/tools/igb_uio_bind.py b/tools/igb_uio_bind.py
index 824aa2b..33adcf4 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/igb_uio_bind.py
@@ -42,8 +42,6 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
-# list of vendor:device pairs (again stored as dictionary) supported by igb_uio
-module_dev_ids = []

 def usage():
 '''Print usage information for the program'''
@@ -147,9 +145,7 @@ def find_module(mod):
 return path

 def check_modules():
-'''Checks that the needed modules (igb_uio) is loaded, and then
-determine from the .ko file, what its supported device ids are'''
-global module_dev_ids
+'''Checks that igb_uio is loaded'''

 fd = file("/proc/modules")
 loaded_mods = fd.readlines()
@@ -165,41 +161,36 @@ def check_modules():
 if not found:
 print "Error - module %s not loaded" %mod
 sys.exit(1)
-
-# now find the .ko and get list of supported vendor/dev-ids
-modpath = find_module(mod)
-if modpath is None:
-print "Cannot find module file %s" % (mod + ".ko")
-sys.exit(1)
-depmod_output = check_output(["depmod", "-n", modpath]).splitlines()
-for line in depmod_output:
-if not line.startswith("alias"):
-continue
-if not line.endswith(mod):
-continue
-lineparts = line.split()
-if not(lineparts[1].startswith("pci:")):
-continue;
-else:
-lineparts[1] = lineparts[1][4:]
-vendor = lineparts[1][:9]
-device = lineparts[1][9:18]
-if vendor.startswith("v") and device.startswith("d"):
-module_dev_ids.append({"Vendor": int(vendor[1:],16), 
-   "Device": int(device[1:],16)})
-
-def is_supported_device(dev_id):
-'''return true if device is supported by igb_uio, false otherwise'''
-for dev in module_dev_ids:
-if (dev["Vendor"] == devices[dev_id]["Vendor"] and 
-dev["Device"] == devices[dev_id]["Device"]):
-return True
-return False

 def has_driver(dev_id):
 '''return true if a device is assigned to a driver. False otherwise'''
 return "Driver_str" in

[dpdk-dev] [PATCH v4 01/20] pci: move open() out of pci_map_resource, rename structs

2014-06-03 Thread Anatoly Burakov
Separating mapping code and calls to open. This is a preparatory work
for VFIO patch since it'll need to map BARs too but it doesn't use path
in mapped_pci_resource. Also, renaming structs to be more generic.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 125 --
 1 file changed, 58 insertions(+), 67 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index ac2c1fe..fd88bd0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -31,39 +31,17 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

-#include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
 #include 
-#include 

-#include 
 #include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
 #include 

 #include "rte_pci_dev_ids.h"
@@ -74,15 +52,12 @@
  * @file
  * PCI probing under linux
  *
- * This code is used to simulate a PCI probe by parsing information in
- * sysfs. Moreover, when a registered driver matches a device, the
- * kernel driver currently using it is unloaded and replaced by
- * igb_uio module, which is a very minimal userland driver for Intel
- * network card, only providing access to PCI BAR to applications, and
- * enabling bus master.
+ * This code is used to simulate a PCI probe by parsing information in sysfs.
+ * When a registered device matches a driver, it is then initialized with
+ * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */

-struct uio_map {
+struct pci_map {
void *addr;
uint64_t offset;
uint64_t size;
@@ -93,18 +68,18 @@ struct uio_map {
  * For multi-process we need to reproduce all PCI mappings in secondary
  * processes, so save them in a tailq.
  */
-struct uio_resource {
-   TAILQ_ENTRY(uio_resource) next;
+struct mapped_pci_resource {
+   TAILQ_ENTRY(mapped_pci_resource) next;

struct rte_pci_addr pci_addr;
char path[PATH_MAX];
-   size_t nb_maps;
-   struct uio_map maps[PCI_MAX_RESOURCE];
+   int nb_maps;
+   struct pci_map maps[PCI_MAX_RESOURCE];
 };

-TAILQ_HEAD(uio_res_list, uio_resource);
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+static struct mapped_pci_res_list *pci_res_list;

-static struct uio_res_list *uio_res_list = NULL;
 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);

 /* unbind kernel driver for this device */
@@ -148,30 +123,17 @@ error:

 /* map a particular resource from a file */
 static void *
-pci_map_resource(void *requested_addr, const char *devname, off_t offset,
-size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
 {
-   int fd;
void *mapaddr;

-   /*
-* open devname, to mmap it
-*/
-   fd = open(devname, O_RDWR);
-   if (fd < 0) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-   devname, strerror(errno));
-   goto fail;
-   }
-
/* Map the PCI memory resource of device */
mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, offset);
-   close(fd);
if (mapaddr == MAP_FAILED ||
(requested_addr != NULL && mapaddr != requested_addr)) {
-   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-   " %s (%p)\n", __func__, devname, fd, requested_addr,
+   RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s 
(%p)\n",
+   __func__, fd, requested_addr,
(unsigned long)size, (unsigned long)offset,
strerror(errno), mapaddr);
goto fail;
@@ -186,10 +148,10 @@ fail:
 }

 #define OFF_MAX  ((uint64_t)(off_t)-1)
-static ssize_t
-pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t 
nb_maps)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
 {
-   size_t i;
+   int i;
char dirname[PATH_MAX];
char filename[PATH_MAX];
uint64_t offset, size;
@@ -249,25 +211,37 @@ pci_uio_get_mappings(const char *devname, struct uio_map 
maps[], size_t nb_maps)
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-size_t i;
-struct uio_resource *uio_res;
+   int fd, i;
+   struct mapped_pci_resource *uio_res;

-   TAILQ_FOREACH(uio_res, uio_res_list, next) {
+   TAILQ_FOREACH(uio_res, pci_res_list, next) {

/* skip this element if it doesn't match our PCI address */
if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
c

[dpdk-dev] [PATCH v4 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING

2014-06-03 Thread Anatoly Burakov
Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.

Signed-off-by: Anatoly Burakov 
---
 app/test/test_pci.c | 4 ++--
 lib/librte_eal/bsdapp/eal/eal_pci.c | 2 +-
 lib/librte_eal/common/include/rte_pci.h | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 2 +-
 lib/librte_pmd_e1000/em_ethdev.c| 2 +-
 lib/librte_pmd_e1000/igb_ethdev.c   | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 4 ++--
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 2 +-
 8 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/app/test/test_pci.c b/app/test/test_pci.c
index 6908d04..fad118e 100644
--- a/app/test/test_pci.c
+++ b/app/test/test_pci.c
@@ -63,7 +63,7 @@ static int my_driver_init(struct rte_pci_driver *dr,
  struct rte_pci_device *dev);

 /*
- * To test cases where RTE_PCI_DRV_NEED_IGB_UIO is set, and isn't set, two
+ * To test cases where RTE_PCI_DRV_NEED_MAPPING is set, and isn't set, two
  * drivers are created (one with IGB devices, the other with IXGBE devices).
  */

@@ -91,7 +91,7 @@ struct rte_pci_driver my_driver = {
.name = "test_driver",
.devinit = my_driver_init,
.id_table = my_driver_id,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 };

 struct rte_pci_driver my_driver2 = {
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 94ae461..eddbd2f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -474,7 +474,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
return 0;
}

-   if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
/* map resources for devices that use igb_uio */
if (pci_uio_map_resource(dev) < 0)
return -1;
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index c793773..11b8c13 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -190,8 +190,8 @@ struct rte_pci_driver {
uint32_t drv_flags; /**< Flags contolling handling 
of device. */
 };

-/** Device needs igb_uio kernel module */
-#define RTE_PCI_DRV_NEED_IGB_UIO 0x0001
+/** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
+#define RTE_PCI_DRV_NEED_MAPPING 0x0001
 /** Device driver must be registered several times until failure */
 #define RTE_PCI_DRV_MULTIPLE 0x0002
 /** Device needs to be unbound even if no module is provided */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 0b779ec..a0abec8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -435,7 +435,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
return 1;
}

-   if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+   if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
/* map resources for devices that use igb_uio */
if ((ret = pci_uio_map_resource(dev)) != 0)
return ret;
diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 493806c..c8355bc 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -280,7 +280,7 @@ static struct eth_driver rte_em_pmd = {
{
.name = "rte_em_pmd",
.id_table = pci_id_em_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_em_dev_init,
.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c 
b/lib/librte_pmd_e1000/igb_ethdev.c
index 5f93bcf..d60f923 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -603,7 +603,7 @@ static struct eth_driver rte_igb_pmd = {
{
.name = "rte_igb_pmd",
.id_table = pci_id_igb_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_igb_dev_init,
.dev_private_size = sizeof(struct e1000_adapter),
@@ -616,7 +616,7 @@ static struct eth_driver rte_igbvf_pmd = {
{
.name = "rte_igbvf_pmd",
.id_table = pci_id_igbvf_map,
-   .drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
},
.eth_dev_init = eth_igbvf_dev_init,
.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c

[dpdk-dev] [PATCH v4 10/20] interrupts: Add support for VFIO interrupts

2014-06-03 Thread Anatoly Burakov
Creating code to handle VFIO interrupts in EAL interrupts (supports all
types of interrupts).

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 285 -
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 2 files changed, 284 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 58e1ddf..c430710 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -36,7 +36,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -44,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -66,6 +66,7 @@
 #include 

 #include "eal_private.h"
+#include "eal_vfio.h"

 #define EAL_INTR_EPOLL_WAIT_FOREVER (-1)

@@ -87,6 +88,9 @@ union intr_pipefds{
  */
 union rte_intr_read_buffer {
int uio_intr_count;  /* for uio device */
+#ifdef VFIO_PRESENT
+   uint64_t vfio_intr_count;/* for vfio device */
+#endif
uint64_t timerfd_num;/* for timerfd */
char charbuf[16];/* for others */
 };
@@ -119,6 +123,244 @@ static struct rte_intr_source_list intr_sources;
 /* interrupt handling thread */
 static pthread_t intr_thread;

+/* VFIO interrupts */
+#ifdef VFIO_PRESENT
+
+#define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
+
+/* enable legacy (INTx) interrupts */
+static int
+vfio_enable_intx(struct rte_intr_handle *intr_handle) {
+   struct vfio_irq_set *irq_set;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   int len, ret;
+   int *fd_ptr;
+
+   len = sizeof(irq_set_buf);
+
+   /* enable INTx */
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | 
VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+   fd_ptr = (int *) &irq_set->data;
+   *fd_ptr = intr_handle->fd;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error enabling INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+
+   /* unmask INTx after enabling */
+   memset(irq_set, 0, len);
+   len = sizeof(struct vfio_irq_set);
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+   return 0;
+}
+
+/* disable legacy (INTx) interrupts */
+static int
+vfio_disable_intx(struct rte_intr_handle *intr_handle) {
+   struct vfio_irq_set *irq_set;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   int len, ret;
+
+   len = sizeof(struct vfio_irq_set);
+
+   /* mask interrupts before disabling */
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+   intr_handle->fd);
+   return -1;
+   }
+
+   /* disable INTx*/
+   memset(irq_set, 0, len);
+   irq_set->argsz = len;
+   irq_set->count = 0;
+   irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+   irq_set->start = 0;
+
+   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+   if (ret) {
+   RTE_LOG(ERR, EAL,
+   "Error disabling INTx interrupts for fd %d\n", 
intr_handle->fd);
+   return -1;
+   }
+   return 0;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msi(struct rte_intr_handle *intr_handle) {
+   int len, ret;
+   char irq_set_buf[IRQ_SET_BUF_LEN];
+   struct vfio_irq_set *irq_set;
+   int *fd_ptr;
+
+   len = sizeof(irq_set_buf);
+
+   irq_set = (struct vfio_irq_set *) irq_set_buf;
+   irq_set->argsz = len;
+   irq_set->count = 1;
+   irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | 
VFIO_IRQ_SET_ACTION_TRIGGER;
+   irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+   irq_set->start = 0;
+  

[dpdk-dev] [PATCH v4 20/20] setup script: adding support for VFIO to setup.sh

2014-06-03 Thread Anatoly Burakov
Support for loading/unloading VFIO drivers, binding/unbinding devices
to/from VFIO, also setting up correct userspace permissions.

Signed-off-by: Anatoly Burakov 
---
 tools/setup.sh | 156 +++--
 1 file changed, 141 insertions(+), 15 deletions(-)

diff --git a/tools/setup.sh b/tools/setup.sh
index e0671b8..3991da9 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -187,6 +187,54 @@ load_igb_uio_module()
 }

 #
+# Unloads VFIO modules.
+#
+remove_vfio_module()
+{
+   echo "Unloading any existing VFIO module"
+   /sbin/lsmod | grep -s vfio > /dev/null
+   if [ $? -eq 0 ] ; then
+   sudo /sbin/rmmod vfio-pci
+   sudo /sbin/rmmod vfio_iommu_type1
+   sudo /sbin/rmmod vfio
+   fi
+}
+
+#
+# Loads new vfio-pci (and vfio module if needed).
+#
+load_vfio_module()
+{
+   remove_vfio_module
+
+   VFIO_PATH="kernel/drivers/vfio/pci/vfio-pci.ko"
+
+   echo "Loading VFIO module"
+   /sbin/lsmod | grep -s vfio_pci > /dev/null
+   if [ $? -ne 0 ] ; then
+   if [ -f /lib/modules/$(uname -r)/$VFIO_PATH ] ; then
+   sudo /sbin/modprobe vfio-pci
+   fi
+   fi
+
+   # make sure regular users can read /dev/vfio
+   echo "chmod /dev/vfio"
+   sudo /usr/bin/chmod a+x /dev/vfio
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # check if /dev/vfio/vfio exists - that way we
+   # know we either loaded the module, or it was
+   # compiled into the kernel
+   if [ ! -e /dev/vfio/vfio ] ; then
+   echo "## ERROR: VFIO not found!"
+   fi
+}
+
+#
 # Unloads the rte_kni.ko module.
 #
 remove_kni_module()
@@ -223,6 +271,55 @@ load_kni_module()
 }

 #
+# Sets appropriate permissions on /dev/vfio/* files
+#
+set_vfio_permissions()
+{
+   # make sure regular users can read /dev/vfio
+   echo "chmod /dev/vfio"
+   sudo /usr/bin/chmod a+x /dev/vfio
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # make sure regular user can access everything inside /dev/vfio
+   echo "chmod /dev/vfio/*"
+   sudo /usr/bin/chmod 0666 /dev/vfio/*
+   if [ $? -ne 0 ] ; then
+   echo "FAIL"
+   quit
+   fi
+   echo "OK"
+
+   # since permissions are only to be set when running as
+   # regular user, we only check ulimit here
+   #
+   # warn if regular user is only allowed
+   # to memlock <64M of memory
+   MEMLOCK_AMNT=`ulimit -l`
+
+   if [ "$MEMLOCK_AMNT" != "unlimited" ] ; then
+   MEMLOCK_MB=`expr $MEMLOCK_AMNT / 1024`
+   echo ""
+   echo "Current user memlock limit: ${MEMLOCK_MB} MB"
+   echo ""
+   echo "This is the maximum amount of memory you will be"
+   echo "able to use with DPDK and VFIO if run as current user."
+   echo -n "To change this, please adjust limits.conf memlock "
+   echo "limit for current user."
+
+   if [ $MEMLOCK_AMNT -lt 65536 ] ; then
+   echo ""
+   echo "## WARNING: memlock limit is less than 64MB"
+   echo -n "## DPDK with VFIO may not be able to 
initialize "
+   echo "if run as current user."
+   fi
+   fi
+}
+
+#
 # Removes all reserved hugepages.
 #
 clear_huge_pages()
@@ -340,7 +437,24 @@ show_nics()
 #
 # Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
-bind_nics()
+bind_nics_to_vfio()
+{
+   if /sbin/lsmod  | grep -q vfio_pci ; then
+   ${RTE_SDK}/tools/dpdk_nic_bind.py --status
+   echo ""
+   echo -n "Enter PCI address of device to bind to VFIO driver: "
+   read PCI_PATH
+   sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b vfio-pci $PCI_PATH && 
echo "OK"
+   else
+   echo "# Please load the 'vfio-pci' kernel module before 
querying or "
+   echo "# adjusting NIC device bindings"
+   fi
+}
+
+#
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
+#
+bind_nics_to_igb_uio()
 {
if  /sbin/lsmod  | grep -q igb_uio ; then 
${RTE_SDK}/tools/dpdk_nic_bind.py --status
@@ -397,20 +511,29 @@ step2_func()
TEXT[1]="Insert IGB UIO module"
FUNC[1]="load_igb_uio_module"

-   TEXT[2]="Insert KNI module"
-   FUNC[2]="load_kni_module"
+   TEXT[2]="Insert VFIO module"
+   FUNC[2]="load_vfio_module"
+
+   TEXT[3]="Insert KNI module"
+   FUNC[3]="load_kni_module"

-   TEXT[3]="Setup hugepage mappings for non-NUMA systems"
-   FUNC[3]="set_non_numa_pages"
+   TEXT[4]="Setup hugepage mappings for non-NUMA systems"
+   FUNC[4]="set_non_numa_pages"

-   TEXT[4]="Setup hugepage mappings for NUMA systems"
-   FUNC[4]="

[dpdk-dev] [PATCH v4 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line

2014-06-03 Thread Anatoly Burakov
Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy", "msi"
or "msix" if VFIO support is compiled. Note that VFIO initialization
will fail if the interrupt type selected is not supported by the system.

If the interrupt type parameter wasn't specified, VFIO will try all
interrupt types (starting with MSI-X).

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index aeb5903..10c40fa 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -99,6 +99,7 @@
 #define OPT_BASE_VIRTADDR   "base-virtaddr"
 #define OPT_XEN_DOM0"xen-dom0"
 #define OPT_CREATE_UIO_DEV "create-uio-dev"
+#define OPT_VFIO_INTR"vfio-intr"

 #define RTE_EAL_BLACKLIST_SIZE 0x100

@@ -361,6 +362,8 @@ eal_usage(const char *prgname)
   "  --"OPT_VMWARE_TSC_MAP": use VMware TSC map instead of "
   "native RDTSC\n"
   "  --"OPT_BASE_VIRTADDR": specify base virtual address\n"
+  "  --"OPT_VFIO_INTR": specify desired interrupt mode for VFIO "
+  "(legacy|msi|msix)\n"
   "  --"OPT_CREATE_UIO_DEV": create /dev/uioX (usually done by 
hotplug)\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"  : use malloc instead of hugetlbfs\n"
@@ -579,6 +582,28 @@ eal_parse_base_virtaddr(const char *arg)
return 0;
 }

+static int
+eal_parse_vfio_intr(const char *mode)
+{
+   unsigned i;
+   static struct {
+   const char *name;
+   enum rte_intr_mode value;
+   } map[] = {
+   { "legacy", RTE_INTR_MODE_LEGACY },
+   { "msi", RTE_INTR_MODE_MSI },
+   { "msix", RTE_INTR_MODE_MSIX },
+   };
+
+   for (i = 0; i < RTE_DIM(map); i++) {
+   if (!strcmp(mode, map[i].name)) {
+   internal_config.vfio_intr_mode = map[i].value;
+   return 0;
+   }
+   }
+   return -1;
+}
+
 static inline size_t
 eal_get_hugepage_mem_size(void)
 {
@@ -633,6 +658,7 @@ eal_parse_args(int argc, char **argv)
{OPT_PCI_BLACKLIST, 1, 0, 0},
{OPT_VDEV, 1, 0, 0},
{OPT_SYSLOG, 1, NULL, 0},
+   {OPT_VFIO_INTR, 1, NULL, 0},
{OPT_BASE_VIRTADDR, 1, 0, 0},
{OPT_XEN_DOM0, 0, 0, 0},
{OPT_CREATE_UIO_DEV, 1, NULL, 0},
@@ -829,6 +855,14 @@ eal_parse_args(int argc, char **argv)
return -1;
}
}
+   else if (!strcmp(lgopts[option_index].name, 
OPT_VFIO_INTR)) {
+   if (eal_parse_vfio_intr(optarg) < 0) {
+   RTE_LOG(ERR, EAL, "invalid parameters 
for --"
+   OPT_VFIO_INTR "\n");
+   eal_usage(prgname);
+   return -1;
+   }
+   }
else if (!strcmp(lgopts[option_index].name, 
OPT_CREATE_UIO_DEV)) {
internal_config.create_uio_dev = 1;
}
-- 
1.8.1.4



[dpdk-dev] [PATCH v2 2/5] distributor: new packet distributor library

2014-06-03 Thread Neil Horman
On Mon, Jun 02, 2014 at 09:40:04PM +, Richardson, Bruce wrote:
> 
> 
> > -Original Message-
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Thursday, May 29, 2014 6:48 AM
> > To: Richardson, Bruce
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2 2/5] distributor: new packet distributor 
> > library
> > 
> > > +
> > > +/* flush the distributor, so that there are no outstanding packets in 
> > > flight or
> > > + * queued up. */
> > Its not clear to me that this is a distributor only function.  You modified 
> > the
> > comments to indicate that lcores can't preform double duty as both a worker
> > and
> > a distributor, which is fine, but it implies that there is a clear 
> > distinction
> > between functions that are 'worker' functions and 'distributor' functions.
> > While its for the most part clear-ish (workers call rte_distributor_get_pkt 
> > and
> > rte_distibutor_return_pkt, distibutors calls rte_distributor_create/process.
> > This is in a grey area.  the analogy I'm thinking of here are kernel 
> > workqueues.
> > Theres a specific workqueue thread that processes the workqueue, but any
> > process
> > can sync or flush the workqueue, leading me to think this process can be 
> > called
> > by a worker lcore.
> 
> I can update comments here further, but I was hoping the way things were 
> right now was clear enough. In the header and C files, I have the functions 
> explicitly split up into distributor and worker function sets, with a big 
> block of text in the header at the start of each section explaining the 
> threading use of the follow functions. 
> 
Very well, we can let use be the determinant here.  We can leave it as is, and
if reports of lockups come in, we can change it, otherwise no harm done.

> > 
> > > +int
> > > +rte_distributor_flush(struct rte_distributor *d)
> > > +{
> > > + unsigned wkr, total_outstanding = 0;
> > > + unsigned flushed = 0;
> > > + unsigned ret_start = d->returns.start,
> > > + ret_count = d->returns.count;
> > > +
> > > + for (wkr = 0; wkr < d->num_workers; wkr++)
> > > + total_outstanding += d->backlog[wkr].count +
> > > + !!(d->in_flight_tags[wkr]);
> > > +
> > > + wkr = 0;
> > > + while (flushed < total_outstanding) {
> > > +
> > > + if (d->in_flight_tags[wkr] != 0 || d->backlog[wkr].count) {
> > > + const int64_t data = d->bufs[wkr].bufptr64;
> > > + uintptr_t oldbuf = 0;
> > > +
> > > + if (data & RTE_DISTRIB_GET_BUF) {
> > > + flushed += (d->in_flight_tags[wkr] != 0);
> > > + if (d->backlog[wkr].count) {
> > > + d->bufs[wkr].bufptr64 =
> > > + backlog_pop(&d-
> > >backlog[wkr]);
> > > + /* we need to mark something as being
> > > +  * in-flight, but it doesn't matter what
> > > +  * as we never check it except
> > > +  * to check for non-zero.
> > > +  */
> > > + d->in_flight_tags[wkr] = 1;
> > > + } else {
> > > + d->bufs[wkr].bufptr64 =
> > > +
> > RTE_DISTRIB_GET_BUF;
> > > + d->in_flight_tags[wkr] = 0;
> > > + }
> > > + oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
> > > + } else if (data & RTE_DISTRIB_RETURN_BUF) {
> > > + if (d->backlog[wkr].count == 0 ||
> > > + move_backlog(d, wkr) == 0) {
> > > + /* only if we move backlog,
> > > +  * process this packet */
> > > + d->bufs[wkr].bufptr64 = 0;
> > > + oldbuf = data >>
> > RTE_DISTRIB_FLAG_BITS;
> > > + flushed++;
> > > + d->in_flight_tags[wkr] = 0;
> > > + }
> > > + }
> > > +
> > > + store_return(oldbuf, d, &ret_start, &ret_count);
> > > + }
> > > +
> > I know the comments for move_backlog say you use that function here rather
> > than
> > what you do in distributor_process because you're tracking the flush count 
> > here.
> > That said, if you instead recomputed the total_outstanding count on each 
> > loop
> > iteration, and tested it for 0, I think you could just reduce the flush
> > operation to a looping call to rte_distributor_process.  It would save you
> > having to maintain the flush code and the move_backlog code separately, 
> > which
> > would be a nice savings.
> 
> Yes, agreed, I should have spotted that myself. I'll look to rework this as 
> soon as I can.
> 
Ok, thanks.


[dpdk-dev] veth interfaces

2014-06-03 Thread Ivano Cerrato
Hello,
thanks for the answer.

Using the KNI API, I create a new interface, and then I "push it into 
the container".
At this point, from the container, I send traffic and it is received in 
the KNI example program.

Instead, in my understanding, it is not possible to attach KNI to an 
already existing interface. Am I right?

Thank you,

 Ivano

Il 30/05/2014 14:49, Zhu, Heqing ha scritto:
> Hi Ivano,
>
> I think you can use the KNI, there is example/kni and doc. Please visit if 
> you have not done it. Please keep us updated on your journey with Docker 
> containers.  :-)
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ivano Cerrato
> Sent: Friday, May 30, 2014 3:02 PM
> To: dev at dpdk.org
> Subject: Re: [dpdk-dev] veth interfaces
>
> Hello,
> I apologize for the imprecise email I sent, and now I try to be more specific 
> about what I would like to do.
>
> I have a module that receives traffic from the network using DPDK and, based 
> on the packet content, should provide packets to the proper Docker container.
> Docker containers receive packets through a veth interface.
>
> Can I use DPDK features, in particular the KNI, to send packets on a veth 
> interface?
> If the KNI is not appropriate, is there something else that I could use?
>
> Regards,
>
> Ivano



[dpdk-dev] [PATCH]xen:support Dom0 driver for Linux kernel 3.13.0 and later

2014-06-03 Thread Jijiang Liu
Since Linux kernel version 3.13.0, the xen_create/destroy_contiguous_region() 
API has been changed,
and the first parameter is physical address in the API.

Signed-off-by: Jijiang Liu 
Acked-by: Huawei Xie 
Tested-by: Heng Ding 
---
 lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c |   24 ++-
 1 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c 
b/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
index c254300..a91c7ec 100644
--- a/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
+++ b/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -262,6 +263,7 @@ dom0_memory_free(uint32_t rsv_size)
for (i = 0; i < dom0_dev.num_bigblock * 2; i += 2) {
vstart = rsv_mm_info[i].vir_addr;
if (vstart) {
+   #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 13, 0)
if (rsv_mm_info[i].exchange_flag)
xen_destroy_contiguous_region(vstart,
DOM0_CONTIG_NUM_ORDER);
@@ -269,6 +271,17 @@ dom0_memory_free(uint32_t rsv_size)
xen_destroy_contiguous_region(vstart +
DOM0_MEMBLOCK_SIZE,
DOM0_CONTIG_NUM_ORDER);
+   #else
+   if (rsv_mm_info[i].exchange_flag)
+   xen_destroy_contiguous_region(rsv_mm_info[i].pfn
+   * PAGE_SIZE,
+   DOM0_CONTIG_NUM_ORDER);
+   if (rsv_mm_info[i + 1].exchange_flag)
+   xen_destroy_contiguous_region(rsv_mm_info[i].pfn
+   * PAGE_SIZE + DOM0_MEMBLOCK_SIZE,
+   DOM0_CONTIG_NUM_ORDER);
+   #endif
+
size = DOM0_MEMBLOCK_SIZE * 2;
vaddr = vstart;
while (size > 0) {
@@ -381,6 +394,10 @@ dom0_memory_reserve(uint32_t rsv_size)
uint64_t pfn, vstart, vaddr;
uint32_t i, num_block, size, allocated_size = 0;

+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 13, 0)
+   dma_addr_t dma_handle;
+#endif
+
/* 2M as memory block */
num_block = rsv_size / SIZE_PER_BLOCK;

@@ -452,8 +469,13 @@ dom0_memory_reserve(uint32_t rsv_size)
 * This API is used to exchage MFN for getting a block of  
 * contiguous physical addresses, its maximum size is 2M.  
 */
+   #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 13, 0)
if (xen_create_contiguous_region(rsv_mm_info[i].vir_addr,
-   DOM0_CONTIG_NUM_ORDER, 0) == 0) {
+   DOM0_CONTIG_NUM_ORDER, 0) == 0) {
+   #else
+   if (xen_create_contiguous_region(rsv_mm_info[i].pfn * PAGE_SIZE,
+   DOM0_CONTIG_NUM_ORDER, 0, &dma_handle) == 0) {
+   #endif
rsv_mm_info[i].exchange_flag = 1;
rsv_mm_info[i].mfn =
pfn_to_mfn(rsv_mm_info[i].pfn);
-- 
1.7.7.6



[dpdk-dev] [PATCH]xen:fix an issue about memory size caculation in Dom0 driver

2014-06-03 Thread Jijiang Liu
The unit of allocated_size is MB,so the change below is made. Otherwise, it 
will fail to free memory when 
available memory is not enough. 

Signed-off-by: Jijiang Liu 
Acked-by: Huawei Xie 
Tested-by: Heng Ding  
---
 lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c 
b/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
index a91c7ec..0f87905 100644
--- a/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
+++ b/lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c
@@ -447,7 +447,7 @@ dom0_memory_reserve(uint32_t rsv_size)
return -ENOMEM;
}

-   allocated_size += DOM0_MEMBLOCK_SIZE;
+   allocated_size += SIZE_PER_BLOCK;

size = DOM0_MEMBLOCK_SIZE;
vaddr = vstart;
-- 
1.7.7.6



[dpdk-dev] [PATCH v2 2/5] distributor: new packet distributor library

2014-06-03 Thread Richardson, Bruce


> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Tuesday, June 03, 2014 4:01 AM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 2/5] distributor: new packet distributor 
> library
> 
> On Mon, Jun 02, 2014 at 09:40:04PM +, Richardson, Bruce wrote:
> >
> >
> > > -Original Message-
> > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > Sent: Thursday, May 29, 2014 6:48 AM
> > > To: Richardson, Bruce
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v2 2/5] distributor: new packet distributor
> library
> > >
> > > > +
> > > > +/* flush the distributor, so that there are no outstanding packets in 
> > > > flight
> or
> > > > + * queued up. */
> > > Its not clear to me that this is a distributor only function.  You 
> > > modified the
> > > comments to indicate that lcores can't preform double duty as both a 
> > > worker
> > > and
> > > a distributor, which is fine, but it implies that there is a clear 
> > > distinction
> > > between functions that are 'worker' functions and 'distributor' functions.
> > > While its for the most part clear-ish (workers call 
> > > rte_distributor_get_pkt and
> > > rte_distibutor_return_pkt, distibutors calls 
> > > rte_distributor_create/process.
> > > This is in a grey area.  the analogy I'm thinking of here are kernel
> workqueues.
> > > Theres a specific workqueue thread that processes the workqueue, but any
> > > process
> > > can sync or flush the workqueue, leading me to think this process can be
> called
> > > by a worker lcore.
> >
> > I can update comments here further, but I was hoping the way things were
> right now was clear enough. In the header and C files, I have the functions
> explicitly split up into distributor and worker function sets, with a big 
> block of
> text in the header at the start of each section explaining the threading use 
> of the
> follow functions.
> >
> Very well, we can let use be the determinant here.  We can leave it as is, and
> if reports of lockups come in, we can change it, otherwise no harm done.
> 
Since I'm not a big fan of the "let's wait for the lock-ups" approach, I'll add 
in a single-line addition to each function's doxygen comment that should make 
its way into the official API docs. :-)


[dpdk-dev] [PATCH v2 2/5] distributor: new packet distributor library

2014-06-03 Thread Neil Horman
On Tue, Jun 03, 2014 at 02:33:16PM +, Richardson, Bruce wrote:
> 
> 
> > -Original Message-
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Tuesday, June 03, 2014 4:01 AM
> > To: Richardson, Bruce
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2 2/5] distributor: new packet distributor 
> > library
> > 
> > On Mon, Jun 02, 2014 at 09:40:04PM +, Richardson, Bruce wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > > Sent: Thursday, May 29, 2014 6:48 AM
> > > > To: Richardson, Bruce
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v2 2/5] distributor: new packet 
> > > > distributor
> > library
> > > >
> > > > > +
> > > > > +/* flush the distributor, so that there are no outstanding packets 
> > > > > in flight
> > or
> > > > > + * queued up. */
> > > > Its not clear to me that this is a distributor only function.  You 
> > > > modified the
> > > > comments to indicate that lcores can't preform double duty as both a 
> > > > worker
> > > > and
> > > > a distributor, which is fine, but it implies that there is a clear 
> > > > distinction
> > > > between functions that are 'worker' functions and 'distributor' 
> > > > functions.
> > > > While its for the most part clear-ish (workers call 
> > > > rte_distributor_get_pkt and
> > > > rte_distibutor_return_pkt, distibutors calls 
> > > > rte_distributor_create/process.
> > > > This is in a grey area.  the analogy I'm thinking of here are kernel
> > workqueues.
> > > > Theres a specific workqueue thread that processes the workqueue, but any
> > > > process
> > > > can sync or flush the workqueue, leading me to think this process can be
> > called
> > > > by a worker lcore.
> > >
> > > I can update comments here further, but I was hoping the way things were
> > right now was clear enough. In the header and C files, I have the functions
> > explicitly split up into distributor and worker function sets, with a big 
> > block of
> > text in the header at the start of each section explaining the threading 
> > use of the
> > follow functions.
> > >
> > Very well, we can let use be the determinant here.  We can leave it as is, 
> > and
> > if reports of lockups come in, we can change it, otherwise no harm done.
> > 
> Since I'm not a big fan of the "let's wait for the lock-ups" approach, I'll 
> add in a single-line addition to each function's doxygen comment that should 
> make its way into the official API docs. :-)
> 
If you're planning on collapsing the flush routine into an iterative call to
distributor_process anyway, then, sure, I'd appreciate it.  

Thanks!
Neil



[dpdk-dev] [PATCH v3 2/5] distributor: new packet distributor library

2014-06-03 Thread Bruce Richardson
This adds the code for a new Intel DPDK library for packet distribution.
The distributor is a component which is designed to pass packets
one-at-a-time to workers, with dynamic load balancing. Using the RSS
field in the mbuf as a tag, the distributor tracks what packet tag is
being processed by what worker and then ensures that no two packets with
the same tag are in-flight simultaneously. Once a tag is not in-flight,
then the next packet with that tag will be sent to the next available
core.

Changes in V2 patch:
* added support for a partial distributor flush when process() API
  called without any new mbufs
* Removed unused "future use" parameters from functions
* Improved comments to be clearer about thread safety
* Add locks around the tailq add in create() API fn
* Stylistic improvements for issues flagged by checkpatch

Changes in V3 patch:
* Flush function rewritten as calls to process API
* Additional doxygen comments on thread-safety of APIs

Signed-off-by: Bruce Richardson 
---
 lib/librte_distributor/Makefile  |  50 
 lib/librte_distributor/rte_distributor.c | 425 +++
 lib/librte_distributor/rte_distributor.h | 194 ++
 3 files changed, 669 insertions(+)
 create mode 100644 lib/librte_distributor/Makefile
 create mode 100644 lib/librte_distributor/rte_distributor.c
 create mode 100644 lib/librte_distributor/rte_distributor.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
new file mode 100644
index 000..36699f8
--- /dev/null
+++ b/lib/librte_distributor/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_distributor.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+
+# this lib needs eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_mbuf
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_distributor/rte_distributor.c 
b/lib/librte_distributor/rte_distributor.c
new file mode 100644
index 000..5eee442
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor.c
@@ -0,0 +1,425 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS

[dpdk-dev] [PATCH v3 2/5] distributor: new packet distributor library

2014-06-03 Thread Neil Horman
On Tue, Jun 03, 2014 at 07:04:12PM +0100, Bruce Richardson wrote:
> This adds the code for a new Intel DPDK library for packet distribution.
> The distributor is a component which is designed to pass packets
> one-at-a-time to workers, with dynamic load balancing. Using the RSS
> field in the mbuf as a tag, the distributor tracks what packet tag is
> being processed by what worker and then ensures that no two packets with
> the same tag are in-flight simultaneously. Once a tag is not in-flight,
> then the next packet with that tag will be sent to the next available
> core.
> 
> Changes in V2 patch:
> * added support for a partial distributor flush when process() API
>   called without any new mbufs
> * Removed unused "future use" parameters from functions
> * Improved comments to be clearer about thread safety
> * Add locks around the tailq add in create() API fn
> * Stylistic improvements for issues flagged by checkpatch
> 
> Changes in V3 patch:
> * Flush function rewritten as calls to process API
> * Additional doxygen comments on thread-safety of APIs
> 
> Signed-off-by: Bruce Richardson 
> ---
>  lib/librte_distributor/Makefile  |  50 
>  lib/librte_distributor/rte_distributor.c | 425 
> +++
>  lib/librte_distributor/rte_distributor.h | 194 ++
>  3 files changed, 669 insertions(+)
>  create mode 100644 lib/librte_distributor/Makefile
>  create mode 100644 lib/librte_distributor/rte_distributor.c
>  create mode 100644 lib/librte_distributor/rte_distributor.h
> 
> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
> new file mode 100644
> index 000..36699f8
> --- /dev/null
> +++ b/lib/librte_distributor/Makefile
> @@ -0,0 +1,50 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +# * Redistributions of source code must retain the above copyright
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright
> +#   notice, this list of conditions and the following disclaimer in
> +#   the documentation and/or other materials provided with the
> +#   distribution.
> +# * Neither the name of Intel Corporation nor the names of its
> +#   contributors may be used to endorse or promote products derived
> +#   from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# library name
> +LIB = librte_distributor.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> +
> +# all source are stored in SRCS-y
> +SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
> +
> +# install this header file
> +SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
> +
> +# this lib needs eal
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_mbuf
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_distributor/rte_distributor.c 
> b/lib/librte_distributor/rte_distributor.c
> new file mode 100644
> index 000..5eee442
> --- /dev/null
> +++ b/lib/librte_distributor/rte_distributor.c
> @@ -0,0 +1,425 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> +