[dpdk-dev] Load-balancing position field in DPDK load_balancer sample app vs. Hash table

2014-11-17 Thread Zhang, Helin
Hi Andrey

Yes, Fortville supports hardware symmetric hashing offload. I am waiting for the
comments of its patch set submitted recently, and hopefully it can be accepted 
soon.
To use it, we need to enable the hash function of symmetric, but not the default
one of Toeplitz hash function. Then we need to set the swap configurations of 
it.
In detail, it is to set the offsets and length of the packet contents to be 
symmetric
hashed.

For more details, please refer to its datasheet (possible chapter of 7.1.10 
Hash Functions)!

Regards,
Helin

> -Original Message-
> From: Chilikin, Andrey
> Sent: Saturday, November 15, 2014 12:57 AM
> To: Ananyev, Konstantin; Yerden Zhumabekov; Kamraan Nasim; dev at dpdk.org
> Cc: Yuanzhang Hu; Zhang, Helin
> Subject: RE: [dpdk-dev] Load-balancing position field in DPDK load_balancer
> sample app vs. Hash table
> 
> Fortville supports symmetrical hashing on HW level, a patch for i40e PMD was
> submitted a couple of weeks ago. For Niantic you can use symmetrical  rss key
> recommended by Konstantin.
> 
> Regards,
> Andrey
> 
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Friday, November 14, 2014 4:50 PM
> To: Yerden Zhumabekov; Kamraan Nasim; dev at dpdk.org
> Cc: Yuanzhang Hu
> Subject: Re: [dpdk-dev] Load-balancing position field in DPDK load_balancer
> sample app vs. Hash table
> 
> > -Original Message-
> > From: Yerden Zhumabekov [mailto:e_zhumabekov at sts.kz]
> > Sent: Friday, November 14, 2014 4:23 PM
> > To: Ananyev, Konstantin; Kamraan Nasim; dev at dpdk.org
> > Cc: Yuanzhang Hu
> > Subject: Re: [dpdk-dev] Load-balancing position field in DPDK
> > load_balancer sample app vs. Hash table
> >
> > I'd like to interject a question here.
> >
> > In case of flow classification, one might possibly prefer for packets
> > from the same flow to fall on the same logical core. With this '%'
> > load balancing, it would require to get the same RSS hash value for
> > packets with direct (src to dst) and swapped (dst to src) IPs and
> > ports. Am I correct that hardware RSS calculation cannot provide this
> symmetry?
> 
> As I remember, it is possible but you have to tweak rss key values.
> Here is a paper describing how to do that:
> http://www.ndsl.kaist.edu/~shinae/papers/TR-symRSS.pdf
> 
> Konstantin
> 
> >
> > 14.11.2014 20:44, Ananyev, Konstantin ?:
> > > If you have a NIC that is capable to do HW hash computation, then
> > > you can do your load balancing based on that value.
> > > Let say ixgbe/igb/i40e NICs can calculate RSS hash value based on
> > > different combinations of dst/src Ips, dst/src ports.
> > > This value can be stored inside mbuf for each RX packet by PMD RX 
> > > function.
> > > Then you can do:
> > > worker_id = mbuf->hash.rss % n_workersl
> > >
> > > That might to provide better balancing then using just one byte
> > > value, plus should be a bit faster, as in that case your balancer code 
> > > don't
> need to touch packet's data.
> > >
> > > Konstantin
> >
> > --
> > Sincerely,
> >
> > Yerden Zhumabekov
> > State Technical Service
> > Astana, KZ
> >



[dpdk-dev] [PATCH RFC] lib/librte_vhost: vhost-user

2014-11-17 Thread Tetsuya Mukawa
Hi Xie,


(2014/11/15 10:14), Huawei Xie wrote:
> implement socket server
> fd event dispatch mechanism
> vhost sock  message handling
> memory map for each region
> VHOST_USER_SET_VRING_KICK_FD as the indicator that vring is available
> VHOST_USER_GET_VRING_BASE as the message that vring should be released
>   
> The message flow between vhost-user and vhost-cuse is kindof different,
> which makes virtio-net common message handler layer difficult and complicated 
> to handle
> both cases in new_device/destroy_device/memory map/resource cleanup.
>
> Will only leave the most common messag handling in virtio-net, and move the
> control logic to cuse/fuse layer.  
>
>
> Signed-off-by: Huawei Xie 
Great patch!
I guess we can start from this patch to implement vhost-user and
abstraction layer.

I've checked patch.

1. White space, tab and indent patch.
I will send patch that clears white space, tab and indent. Could you
please check it?
It might be difficult to see the difference, if your editor doesn't show
a space or tab.

2. Some files are based on old codes.
At least, following patch is not included.
- vhost: fix build without unused result
Also vhost_rxtx.c isn't probably based on latest code.

3. Device abstraction layer code
I will send the device abstraction layer code after this email.
Anyway, I guess we need to decide whether, or not we still keep
vhost-cuse code

4. Multiple devices operation.
For example, when thread1 opens vhost-user device1 and thread2 opens
vhost-user device2,
each thread may want to register own callbacks.
Current implementation may not allow this.
I guess we need to eliminate global variables in librte_vhost as much as
possible.

Thanks,
Tetsuya

> ---
>  lib/librte_vhost/Makefile |  14 +-
>  lib/librte_vhost/eventfd_link/eventfd_link.c  |  27 +-
>  lib/librte_vhost/eventfd_link/eventfd_link.h  |  48 +-
>  lib/librte_vhost/libvirt/qemu-wrap.py | 367 ---
>  lib/librte_vhost/rte_virtio_net.h | 106 ++---
>  lib/librte_vhost/vhost-cuse/vhost-net-cdev.c  | 436 ++
>  lib/librte_vhost/vhost-cuse/virtio-net-cdev.c | 314 +
>  lib/librte_vhost/vhost-cuse/virtio-net-cdev.h |  43 ++
>  lib/librte_vhost/vhost-net-cdev.c | 389 
>  lib/librte_vhost/vhost-net-cdev.h | 113 -
>  lib/librte_vhost/vhost-user/fd_man.c  | 158 +++
>  lib/librte_vhost/vhost-user/fd_man.h  |  31 ++
>  lib/librte_vhost/vhost-user/vhost-net-user.c  | 417 +
>  lib/librte_vhost/vhost-user/vhost-net-user.h  |  74 +++
>  lib/librte_vhost/vhost-user/virtio-net-user.c | 208 +
>  lib/librte_vhost/vhost-user/virtio-net-user.h |  11 +
>  lib/librte_vhost/vhost_rxtx.c | 625 
> --
>  lib/librte_vhost/virtio-net.c | 450 ---
>  18 files changed, 1939 insertions(+), 1892 deletions(-)
>  delete mode 100755 lib/librte_vhost/libvirt/qemu-wrap.py
>  create mode 100644 lib/librte_vhost/vhost-cuse/vhost-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost-cuse/virtio-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost-cuse/virtio-net-cdev.h
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost-user/fd_man.c
>  create mode 100644 lib/librte_vhost/vhost-user/fd_man.h
>  create mode 100644 lib/librte_vhost/vhost-user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost-user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost-user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost-user/virtio-net-user.h
>
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index c008d64..cb4e172 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -34,17 +34,19 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # library name
>  LIB = librte_vhost.a
>  
> -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64 -lfuse
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I. -I vhost-user -I vhost-cuse -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
>  LDFLAGS += -lfuse
>  # all source are stored in SRCS-y
> -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-net-cdev.c virtio-net.c vhost_rxtx.c
> +#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-cuse/vhost-net-cdev.c 
> vhost-cuse/virtio-net-cdev.c
> +
> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-user/fd_man.c 
> vhost-user/vhost-net-user.c vhost-user/virtio-net-user.c
> +
> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += virtio-net.c vhost_rxtx.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
>  
> -# dependencies
> -DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_eal
> -DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_ether
> -DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_mbuf
> +# this lib needs eal
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_eal lib/librte_mbuf
>  
>  include $(RTE_SDK)

[dpdk-dev] [RFC PATCH] lib/librte_vhost: cleanup white spaces, tabs and indents

2014-11-17 Thread Tetsuya Mukawa
---
 lib/librte_vhost/rte_virtio_net.h |  4 +--
 lib/librte_vhost/vhost-cuse/vhost-net-cdev.c  |  4 +--
 lib/librte_vhost/vhost-cuse/virtio-net-cdev.c |  8 ++---
 lib/librte_vhost/vhost-user/fd_man.c  | 13 
 lib/librte_vhost/vhost-user/fd_man.h  |  2 +-
 lib/librte_vhost/vhost-user/vhost-net-user.c  | 37 +++---
 lib/librte_vhost/vhost-user/virtio-net-user.c | 44 +--
 lib/librte_vhost/vhost_rxtx.c |  2 +-
 lib/librte_vhost/virtio-net.c | 10 +++---
 9 files changed, 61 insertions(+), 63 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 7a05dab..7d7d001 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -140,12 +140,12 @@ gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
 }

 /**
- *  Disable features in feature_mask. Returns 0 on success.
+ * Disable features in feature_mask. Returns 0 on success.
  */
 int rte_vhost_feature_disable(uint64_t feature_mask);

 /**
- *  Enable features in feature_mask. Returns 0 on success.
+ * Enable features in feature_mask. Returns 0 on success.
  */
 int rte_vhost_feature_enable(uint64_t feature_mask);

diff --git a/lib/librte_vhost/vhost-cuse/vhost-net-cdev.c 
b/lib/librte_vhost/vhost-cuse/vhost-net-cdev.c
index 4671643..688ec00 100644
--- a/lib/librte_vhost/vhost-cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost-cuse/vhost-net-cdev.c
@@ -329,7 +329,7 @@ vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
} else {
int fd;
file = *(const struct vhost_vring_file *)in_buf;
-   LOG_DEBUG(VHOST_CONFIG, 
+   LOG_DEBUG(VHOST_CONFIG,
"kick/call idx:%d fd:%d\n", file.index, 
file.fd);
if ((fd = eventfd_copy(file.fd, ctx.pid)) < 0){
fuse_reply_ioctl(req, -1, NULL, 0);
@@ -338,7 +338,7 @@ vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
if (cmd == VHOST_SET_VRING_KICK) {
VHOST_IOCTL_R(struct vhost_vring_file, file, 
ops->set_vring_call);
}
-   else { 
+   else {
VHOST_IOCTL_R(struct vhost_vring_file, file, 
ops->set_vring_kick);
}
}
diff --git a/lib/librte_vhost/vhost-cuse/virtio-net-cdev.c 
b/lib/librte_vhost/vhost-cuse/virtio-net-cdev.c
index 5c16aa5..7381140 100644
--- a/lib/librte_vhost/vhost-cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost-cuse/virtio-net-cdev.c
@@ -288,7 +288,7 @@ cuse_set_mem_table(struct vhost_device_ctx ctx, const 
struct vhost_memory *mem_r
base_address =
regions[idx].userspace_address;
/* Map VM memory file */
-   if (host_memory_map(ctx.pid, base_address, 
+   if (host_memory_map(ctx.pid, base_address,
&mapped_address, &mapped_size) != 0) {
return -1;
}
@@ -297,18 +297,18 @@ cuse_set_mem_table(struct vhost_device_ctx ctx, const 
struct vhost_memory *mem_r

/* Check that we have a valid base address. */
if (base_address == 0) {
-   RTE_LOG(ERR, VHOST_CONFIG, 
+   RTE_LOG(ERR, VHOST_CONFIG,
"Failed to find base address of qemu memory file.\n");
return -1;
}

for (idx = 0; idx < nregions; idx++) {
-   regions[idx].address_offset = 
+   regions[idx].address_offset =
mapped_address - base_address +
regions[idx].userspace_address -
regions[idx].guest_phys_address;
}
-   
+
ops->set_mem_table(ctx, ®ions[0], nregions);
return 0;
 }
diff --git a/lib/librte_vhost/vhost-user/fd_man.c 
b/lib/librte_vhost/vhost-user/fd_man.c
index c7fd3f2..cbc656b 100644
--- a/lib/librte_vhost/vhost-user/fd_man.c
+++ b/lib/librte_vhost/vhost-user/fd_man.c
@@ -15,7 +15,7 @@
  * Returns the index in the fdset for a fd.
  * If fd is -1, it means to search for a free entry.
  * @return
- *   Index for the fd, or -1 if fd isn't in the fdset.
+ *  Index for the fd, or -1 if fd isn't in the fdset.
  */
 static int
 fdset_find_fd(struct fdset *pfdset, int fd)
@@ -23,8 +23,8 @@ fdset_find_fd(struct fdset *pfdset, int fd)
int i;

for (i = 0; i < pfdset->num && pfdset->fd[i].fd != fd; i++);
-   
-   return i ==  pfdset->num ? -1 : i;
+
+   return i == pfdset->num ? -1 : i;
 }

 static int
@@ -35,7 +35,7 @@ fdset_find_free_slot(struct fdset *pfdset)
 }

 static void
-fdset_add_fd(struct fdset  *pfdset, int idx, int fd, fd_cb rcb, 
+fdset_add_fd(struct fdset

[dpdk-dev] [RFC PATCH 1/2] lib/librte_vhost: change macro name of include guard.

2014-11-17 Thread Tetsuya Mukawa
This patch changes include macro name like following.
- "_VIRTIO_NET_H_" > "_RTE_VIRTIO_NET_H_"
---
 lib/librte_vhost/rte_virtio_net.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 7d7d001..a09533d 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -31,8 +31,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

-#ifndef _VIRTIO_NET_H_
-#define _VIRTIO_NET_H_
+#ifndef _RTE_VIRTIO_NET_H_
+#define _RTE_VIRTIO_NET_H_

 #include 
 #include 
@@ -189,4 +189,4 @@ uint32_t rte_vhost_enqueue_burst(struct virtio_net *dev, 
uint16_t queue_id,
 uint32_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint32_t count);

-#endif /* _VIRTIO_NET_H_ */
+#endif /* _RTE_VIRTIO_NET_H_ */
-- 
1.9.1



[dpdk-dev] [RFC PATCH 2/2] lib/librte_vhost: Add device abstraction layer

2014-11-17 Thread Tetsuya Mukawa
---
 lib/librte_vhost/Makefile |   6 +-
 lib/librte_vhost/rte_virtio_net.h |  22 -
 lib/librte_vhost/vhost-cuse/vhost-net-cdev.c  |   6 +-
 lib/librte_vhost/vhost-cuse/vhost-net-cdev.h  |  40 +
 lib/librte_vhost/vhost-cuse/virtio-net-cdev.c |   1 +
 lib/librte_vhost/vhost-net.c  | 101 +++
 lib/librte_vhost/vhost-net.h  | 114 ++
 lib/librte_vhost/vhost-user/vhost-net-user.c  |   6 +-
 lib/librte_vhost/vhost-user/vhost-net-user.h  |   3 +
 lib/librte_vhost/vhost-user/virtio-net-user.c |   1 +
 10 files changed, 290 insertions(+), 10 deletions(-)
 create mode 100644 lib/librte_vhost/vhost-cuse/vhost-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost-net.c
 create mode 100644 lib/librte_vhost/vhost-net.h

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index cb4e172..4363a14 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -37,11 +37,11 @@ LIB = librte_vhost.a
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I. -I vhost-user -I vhost-cuse -O3 
-D_FILE_OFFSET_BITS=64 -lfuse
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-cuse/vhost-net-cdev.c 
vhost-cuse/virtio-net-cdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-cuse/vhost-net-cdev.c 
vhost-cuse/virtio-net-cdev.c

-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-user/fd_man.c 
vhost-user/vhost-net-user.c vhost-user/virtio-net-user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost-user/fd_man.c 
vhost-user/vhost-net-user.c vhost-user/virtio-net-user.c

-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += virtio-net.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += virtio-net.c vhost_rxtx.c vhost-net.c

 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index a09533d..116c7e9 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -140,6 +140,23 @@ gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
 }

 /**
+ * Enum for vhost driver types.
+ */
+enum rte_vhost_driver_t {
+   VHOST_DRV_CUSE, /* vhost-cuse driver */
+   VHOST_DRV_USER, /* vhost-user driver */
+   VHOST_DRV_NUM   /* the number of vhost driver types */
+};
+
+/**
+  * Structure contains information relating vhost driver.
+  */
+struct rte_vhost_driver {
+   enum rte_vhost_driver_t type;   /**< driver type. */
+   const char  *dev_name;  /**< accessing device name. */
+};
+
+/**
  * Disable features in feature_mask. Returns 0 on success.
  */
 int rte_vhost_feature_disable(uint64_t feature_mask);
@@ -155,12 +172,13 @@ uint64_t rte_vhost_feature_get(void);
 int rte_vhost_enable_guest_notification(struct virtio_net *dev, uint16_t 
queue_id, int enable);

 /* Register vhost driver. dev_name could be different for multiple instance 
support. */
-int rte_vhost_driver_register(const char *dev_name);
+struct rte_vhost_driver *rte_vhost_driver_register(
+   const char *dev_name, enum rte_vhost_driver_t type);

 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * 
const);

-int rte_vhost_driver_session_start(void);
+int rte_vhost_driver_session_start(struct rte_vhost_driver *drv);

 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
diff --git a/lib/librte_vhost/vhost-cuse/vhost-net-cdev.c 
b/lib/librte_vhost/vhost-cuse/vhost-net-cdev.c
index 688ec00..6ea54ee 100644
--- a/lib/librte_vhost/vhost-cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost-cuse/vhost-net-cdev.c
@@ -47,6 +47,7 @@

 #include "virtio-net-cdev.h"
 #include "vhost-net.h"
+#include "vhost-net-cdev.h"
 #include "eventfd_link/eventfd_link.h"

 #define FUSE_OPT_DUMMY "\0\0"
@@ -373,8 +374,9 @@ static const struct cuse_lowlevel_ops vhost_net_ops = {
  * vhost_net_device_ops are also passed when the device is registered in app.
  */
 int
-rte_vhost_driver_register(const char *dev_name)
+vhost_cuse_driver_register(struct rte_vhost_driver *drv)
 {
+   const char *dev_name = drv->dev_name;
struct cuse_info cuse_info;
char device_name[PATH_MAX] = "";
char char_device_name[PATH_MAX] = "";
@@ -428,7 +430,7 @@ rte_vhost_driver_register(const char *dev_name)
  * release and ioctl calls.
  */
 int
-rte_vhost_driver_session_start(void)
+vhost_cuse_driver_session_start(void)
 {
fuse_session_loop(session);

diff --git a/lib/librte_vhost/vhost-cuse/vhost-net-cdev.h 
b/lib/librte_vhost/vhost-cuse/vhost-net-cdev.h
new file mode 100644
index 000..cb094ee
--- /dev/null
+++ b/lib/librte_vhost/vhost-cuse/vhost-net-cdev.h
@@ -0,0 +1,40 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2014 IGEL Co.,Ltd. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided

[dpdk-dev] [PATCH RFC] lib/librte_vhost: vhost-user

2014-11-17 Thread Tetsuya Mukawa
Hi Xie,

(2014/11/17 15:04), Tetsuya Mukawa wrote:
> Hi Xie,
>
>
> (2014/11/15 10:14), Huawei Xie wrote:
>> implement socket server
>> fd event dispatch mechanism
>> vhost sock  message handling
>> memory map for each region
>> VHOST_USER_SET_VRING_KICK_FD as the indicator that vring is available
>> VHOST_USER_GET_VRING_BASE as the message that vring should be released
>>   
>> The message flow between vhost-user and vhost-cuse is kindof different,
>> which makes virtio-net common message handler layer difficult and 
>> complicated to handle
>> both cases in new_device/destroy_device/memory map/resource cleanup.
>>
>> Will only leave the most common messag handling in virtio-net, and move the
>> control logic to cuse/fuse layer.  
>>
>>
>> Signed-off-by: Huawei Xie 
> Great patch!
> I guess we can start from this patch to implement vhost-user and
> abstraction layer.
>
> I've checked patch.
>
> 1. White space, tab and indent patch.
> I will send patch that clears white space, tab and indent. Could you
> please check it?
> It might be difficult to see the difference, if your editor doesn't show
> a space or tab.
>
> 2. Some files are based on old codes.
> At least, following patch is not included.
> - vhost: fix build without unused result
> Also vhost_rxtx.c isn't probably based on latest code.
>
> 3. Device abstraction layer code
> I will send the device abstraction layer code after this email.
> Anyway, I guess we need to decide whether, or not we still keep
> vhost-cuse code
Additionally, the above patches are based on your RFC patch.

Tetsuya

>
> 4. Multiple devices operation.
> For example, when thread1 opens vhost-user device1 and thread2 opens
> vhost-user device2,
> each thread may want to register own callbacks.
> Current implementation may not allow this.
> I guess we need to eliminate global variables in librte_vhost as much as
> possible.
>
> Thanks,
> Tetsuya
>
>> ---
>>  lib/librte_vhost/Makefile |  14 +-
>>  lib/librte_vhost/eventfd_link/eventfd_link.c  |  27 +-
>>  lib/librte_vhost/eventfd_link/eventfd_link.h  |  48 +-
>>  lib/librte_vhost/libvirt/qemu-wrap.py | 367 ---
>>  lib/librte_vhost/rte_virtio_net.h | 106 ++---
>>  lib/librte_vhost/vhost-cuse/vhost-net-cdev.c  | 436 ++
>>  lib/librte_vhost/vhost-cuse/virtio-net-cdev.c | 314 +
>>  lib/librte_vhost/vhost-cuse/virtio-net-cdev.h |  43 ++
>>  lib/librte_vhost/vhost-net-cdev.c | 389 
>>  lib/librte_vhost/vhost-net-cdev.h | 113 -
>>  lib/librte_vhost/vhost-user/fd_man.c  | 158 +++
>>  lib/librte_vhost/vhost-user/fd_man.h  |  31 ++
>>  lib/librte_vhost/vhost-user/vhost-net-user.c  | 417 +
>>  lib/librte_vhost/vhost-user/vhost-net-user.h  |  74 +++
>>  lib/librte_vhost/vhost-user/virtio-net-user.c | 208 +
>>  lib/librte_vhost/vhost-user/virtio-net-user.h |  11 +
>>  lib/librte_vhost/vhost_rxtx.c | 625 
>> --
>>  lib/librte_vhost/virtio-net.c | 450 ---
>>  18 files changed, 1939 insertions(+), 1892 deletions(-)
>>  delete mode 100755 lib/librte_vhost/libvirt/qemu-wrap.py
>>  create mode 100644 lib/librte_vhost/vhost-cuse/vhost-net-cdev.c
>>  create mode 100644 lib/librte_vhost/vhost-cuse/virtio-net-cdev.c
>>  create mode 100644 lib/librte_vhost/vhost-cuse/virtio-net-cdev.h
>>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
>>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
>>  create mode 100644 lib/librte_vhost/vhost-user/fd_man.c
>>  create mode 100644 lib/librte_vhost/vhost-user/fd_man.h
>>  create mode 100644 lib/librte_vhost/vhost-user/vhost-net-user.c
>>  create mode 100644 lib/librte_vhost/vhost-user/vhost-net-user.h
>>  create mode 100644 lib/librte_vhost/vhost-user/virtio-net-user.c
>>  create mode 100644 lib/librte_vhost/vhost-user/virtio-net-user.h
>>
>> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
>> index c008d64..cb4e172 100644
>> --- a/lib/librte_vhost/Makefile
>> +++ b/lib/librte_vhost/Makefile
>> @@ -34,17 +34,19 @@ include $(RTE_SDK)/mk/rte.vars.mk
>>  # library name
>>  LIB = librte_vhost.a
>>  
>> -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64 -lfuse
>> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I. -I vhost-user -I vhost-cuse -O3 
>> -D_FILE_OFFSET_BITS=64 -lfuse
>>  LDFLAGS += -lfuse
>>  # all source are stored in SRCS-y
>> -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-net-cdev.c virtio-net.c 
>> vhost_rxtx.c
>> +#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-cuse/vhost-net-cdev.c 
>> vhost-cuse/virtio-net-cdev.c
>> +
>> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-user/fd_man.c 
>> vhost-user/vhost-net-user.c vhost-user/virtio-net-user.c
>> +
>> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += virtio-net.c vhost_rxtx.c
>>  
>>  # install includes
>>  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
>>  
>> -# dependencies
>> -DEPDIRS-$(CONFIG_RTE

[dpdk-dev] vhost-user technical isssues

2014-11-17 Thread Tetsuya Mukawa
Hi Xie,

(2014/11/14 19:59), Xie, Huawei wrote:
> I tested with latest qemu(with offset fix) in vhost app(not with test case), 
> unmap succeeds only when the size is aligned to 1GB(hugepage size).
I appreciate for your testing.

> Another important thing is  could we do mmap(0, region[i].memory_size, 
> PROT_XX, mmap_offset) rather than with offset 0? With the region above 4GB, 
> we will waste 4GB address space. Or we at least need to round down offset to 
> nearest 1GB, and round up memory size to upper 1GB, to save some address 
> space waste.
>
> Anyway, this is ugly. Kernel doesn't take care of us, do those alignment for 
> us automatically.
>

It seems 'offset' also should be aligned by hugepage size also.
But it might be a specification of mmap. Manpage of mmap says 'offset'
should be aligned by sysconf(_SC_PAGE_SIZE).
If the target file is on hugetlbfs, I guess hugepage size is used as
alignment size.

Thanks,
Tetsuya




[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload

2014-11-17 Thread Liu, Jijiang


> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Friday, November 14, 2014 5:10 PM
> To: Liu, Jijiang; Thomas Monjalon
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum
> offload
> 
> Hi Jijiang,
> 
> On 11/14/2014 09:15 AM, Liu, Jijiang wrote:
> >
> > Thomas Monjalon wrote:
> >>
> >> You mean that PKT_TX_VXLAN_CKSUM request hardware checksumming of
> >> outer L3, outer L4, inner L3 and inner L4?
> >> So maybe the name and comments are not enough clear.
> >
> > Yes, PKT_TX_VXLAN_CKSUM request hardware checksum of outerL3, outer L4,
> inner L3 and inner L4.
> 
> I don't understand: it looks in contradiction with our previous
> discussion:
> 
> Olivier Matz wrote:
> >
> > Liu, Jijiang wrote:
> >>
> >> Olivier Matz wrote:
> >>> What is the
> >>> meaning of this flag? Is it enough to checksum outer L3, inner L3,
> >>> and inner L4 as specified in commit log? If yes, why are the other
> >>> flags PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM, (...) added in the mbuf
> later?
> >>> In my comprehension, these flags are needed in addition to
> >>> PKT_TX_VXLAN_CKSUM to do the checksum of the inner headers.
> >>
> >> Yes, these flags(PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM)  are needed by
> >> HW offload of non-tunneling and tunneling  packet.
> >
> > OK, so I understand that when PKT_TX_VXLAN_CKSUM is set, if the driver
> > supports it, it will process IP and UDP checksum of outer header,
> > using l2_len and l3_len.
> 
> So you say that PKT_TX_VXLAN_CKSUM is enough for all inner and outer headers,
> but also that PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM are needed.
> What occurs if we don't set them?
> 
> Now let's say you have an application that receives a TCP packet, then
> encaspulate it in vxlan, and forward it. You want to regenerate the checksum 
> for
> the new outer headers, but you don't need to change the inner ones.
> You say that setting the PKT_TX_VXLAN_CKSUM will request the hw to process
> inner and outer checksum. This is not required in that case.
> Also, do you need to set the pseudo header checksum in the TCP inner header?

Anyway, I explain the checksum mechanism here again.

In my VXLAN patch set, for an VXLAN packet TX checksum offload,  the logics 
below:

1. only set outer L3/L4 header TX checksum
tx_checksum set 0x3(0r 0x1) 0
  In this case, the PKT_TX_VXLAN_CKSUM flag is not set as we don't set inner 
flags(PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM), so we don't need to change inner 
ones, driver think it is the ordinary packet,  
HW will do outer L3/L4 checksum offload. 

2. only set inner L3/L4 header TX checksum
tx_checksum set 0x30 0
  In this case, the PKT_TX_VXLAN_CKSUM flag is set, so driver think it is VXLAN 
packet, and we don't need to change outer ones because we don't set outer flags 
here (PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM).

3. set outer L3/L4 TX checksum and inner L3&L4 TX checksum 
tx_checksum set 0x31(0x33) 0
in this case, the PKT_TX_VXLAN_CKSUM flag is set, driver think it is VXLAN 
packet, and we need to change outer and inner as both outer and inner flags are 
set.

I'm reviewing your TSO patch to see if your logic is correct in the checksum 
engine.


> Regards,
> Olivier


[dpdk-dev] [PATCH v2 08/13] testpmd: rework csum forward engine

2014-11-17 Thread Liu, Jijiang


> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Saturday, November 15, 2014 1:03 AM
> To: dev at dpdk.org
> Cc: olivier.matz at 6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong;
> jigsaw at gmail.com; Richardson, Bruce
> Subject: [PATCH v2 08/13] testpmd: rework csum forward engine
> 
> The csum forward engine was becoming too complex to be used and extended
> (the next commits want to add the support of TSO):
> 
> - no explaination about what the code does
> - code is not factorized, lots of code duplicated, especially between
>   ipv4/ipv6
> - user command line api: use of bitmasks that need to be calculated by
>   the user
> - the user flags don't have the same semantic:
>   - for legacy IP/UDP/TCP/SCTP, it selects software or hardware checksum
>   - for other (vxlan), it selects between hardware checksum or no
> checksum
> - the code relies too much on flags set by the driver without software
>   alternative (ex: PKT_RX_TUNNEL_IPV4_HDR). It is nice to be able to
>   compare a software implementation with the hardware offload.
> 
> This commit tries to fix these issues, and provide a simple definition of 
> what is
> done by the forward engine:
> 
>  * Receive a burst of packets, and for supported packet types:
>  *  - modify the IPs
>  *  - reprocess the checksum in SW or HW, depending on testpmd command line
>  *configuration
>  * Then packets are transmitted on the output port.
>  *
>  * Supported packets are:
>  *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
>  *   Ether / (vlan) / IP|IP6 / UDP / VxLAN / Ether / IP|IP6 / UDP|TCP|SCTP
>  *
>  * The network parser supposes that the packet is contiguous, which may
>  * not be the case in real life.
> 
> Signed-off-by: Olivier Matz 
> ---
>  app/test-pmd/cmdline.c  | 151 ---
>  app/test-pmd/config.c   |  11 -
>  app/test-pmd/csumonly.c | 668 ++-
> -
>  app/test-pmd/testpmd.h  |  17 +-
>  4 files changed, 423 insertions(+), 424 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> 4c3fc76..0361e58 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -310,19 +310,14 @@ static void cmd_help_long_parsed(void
> *parsed_result,
>   "Disable hardware insertion of a VLAN header in"
>   " packets sent on a port.\n\n"
> 
> - "tx_checksum set (mask) (port_id)\n"
> - "Enable hardware insertion of checksum offload with"
> - " the 8-bit mask, 0~0xff, in packets sent on a port.\n"
> - "bit 0 - insert ip   checksum offload if set\n"
> - "bit 1 - insert udp  checksum offload if set\n"
> - "bit 2 - insert tcp  checksum offload if set\n"
> - "bit 3 - insert sctp checksum offload if set\n"
> - "bit 4 - insert inner ip  checksum offload if 
> set\n"
> - "bit 5 - insert inner udp checksum offload if 
> set\n"
> - "bit 6 - insert inner tcp checksum offload if 
> set\n"
> - "bit 7 - insert inner sctp checksum offload if 
> set\n"
> + "tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw)
> (port_id)\n"
> + "Enable hardware calculation of checksum with when"
> + " transmitting a packet using 'csum' forward engine.\n"
>   "Please check the NIC datasheet for HW limits.\n\n"
> 
> + "tx_checksum show (port_id)\n"
> + "Display tx checksum offload configuration\n\n"
> +
>   "set fwd (%s)\n"
>   "Set packet forwarding mode.\n\n"
> 
> @@ -2738,48 +2733,131 @@ cmdline_parse_inst_t cmd_tx_vlan_reset = {
> 
> 
>  /* *** ENABLE HARDWARE INSERTION OF CHECKSUM IN TX PACKETS *** */ -
> struct cmd_tx_cksum_set_result {
> +struct cmd_tx_cksum_result {
>   cmdline_fixed_string_t tx_cksum;
> - cmdline_fixed_string_t set;
> - uint8_t cksum_mask;
> + cmdline_fixed_string_t mode;
> + cmdline_fixed_string_t proto;
> + cmdline_fixed_string_t hwsw;
>   uint8_t port_id;
>  };
> 
>  static void
> -cmd_tx_cksum_set_parsed(void *parsed_result,
> +cmd_tx_cksum_parsed(void *parsed_result,
>  __attribute__((unused)) struct cmdline *cl,
>  __attribute__((unused)) void *data)  {
> - struct cmd_tx_cksum_set_result *res = parsed_result;
> + struct cmd_tx_cksum_result *res = parsed_result;
> + int hw = 0;
> + uint16_t ol_flags, mask = 0;
> + struct rte_eth_dev_info dev_info;
> +
> + if (port_id_is_invalid(res->port_id)) {
> + printf("invalid port %d\n", res->port_id);
> + return;
> + }
> 
> - tx_cksum_set(res->port_id, res->cksum_mask);
>

[dpdk-dev] [PULL REQUEST] doc: programmers guide.

2014-11-17 Thread Bernard Iremonger
These changes are a conversion of the Programmers Guide from an MSWord file to 
Sphinx rst files.

The following changes since commit 07db4a9750940bbf95299a89afd12d178556e064:

  examples/distributor: new sample app (2014-11-16 22:54:56 +0100)

are available in the git repository at:
  git://dpdk.org/next/dpdk-doc  master

Bernard Iremonger (1):
  doc: programmers guide

 doc/guides/index.rst   |1 +
 doc/guides/prog_guide/build_app.rst|  128 +
 doc/guides/prog_guide/dev_kit_build_system.rst |  418 
 doc/guides/prog_guide/dev_kit_root_make_help.rst   |  255 ++
 doc/guides/prog_guide/driver_vm_emul_dev.rst   |  178 ++
 doc/guides/prog_guide/env_abstraction_layer.rst|  215 ++
 doc/guides/prog_guide/ext_app_lib_make_help.rst|  125 +
 doc/guides/prog_guide/extend_intel_dpdk.rst|  136 ++
 doc/guides/prog_guide/glossary.rst |  199 ++
 doc/guides/prog_guide/hash_lib.rst |  134 ++
 .../prog_guide/i40e_ixgbe_igb_virt_func_drv.rst|  553 +
 .../prog_guide/img/architecture-overview.svg   | 1011 
 doc/guides/prog_guide/img/blk_diag_dropper.png |  Bin 0 -> 55303 bytes
 doc/guides/prog_guide/img/console.png  |  Bin 0 -> 40850 bytes
 doc/guides/prog_guide/img/data_struct_per_port.png |  Bin 0 -> 58769 bytes
 doc/guides/prog_guide/img/dpdk_xen_pkt_switch.png  |  Bin 0 -> 163842 bytes
 doc/guides/prog_guide/img/drop_probability_eq3.png |  Bin 0 -> 3205 bytes
 doc/guides/prog_guide/img/drop_probability_eq4.png |  Bin 0 -> 2737 bytes
 .../prog_guide/img/drop_probability_graph.png  |  Bin 0 -> 62349 bytes
 doc/guides/prog_guide/img/eq2_expression.png   |  Bin 0 -> 1614 bytes
 doc/guides/prog_guide/img/eq2_factor.png   |  Bin 0 -> 995 bytes
 doc/guides/prog_guide/img/ewma_filter_eq_1.png |  Bin 0 -> 840 bytes
 doc/guides/prog_guide/img/ewma_filter_eq_2.png |  Bin 0 -> 1462 bytes
 .../prog_guide/img/ex_data_flow_tru_dropper.png|  Bin 0 -> 32578 bytes
 doc/guides/prog_guide/img/fast_pkt_proc.png|  Bin 0 -> 355905 bytes
 doc/guides/prog_guide/img/figure32.png |  Bin 0 -> 11603 bytes
 doc/guides/prog_guide/img/figure33.png |  Bin 0 -> 65216 bytes
 doc/guides/prog_guide/img/figure34.png |  Bin 0 -> 11581 bytes
 doc/guides/prog_guide/img/figure35.png |  Bin 0 -> 75012 bytes
 doc/guides/prog_guide/img/figure37.png |  Bin 0 -> 6934 bytes
 doc/guides/prog_guide/img/figure38.png |  Bin 0 -> 7372 bytes
 doc/guides/prog_guide/img/figure39.png |  Bin 0 -> 55986 bytes
 doc/guides/prog_guide/img/flow_tru_droppper.png|  Bin 0 -> 30870 bytes
 doc/guides/prog_guide/img/forward_stats.png|  Bin 0 -> 8849 bytes
 doc/guides/prog_guide/img/grant_refs.png   |  Bin 0 -> 6405 bytes
 doc/guides/prog_guide/img/grant_table.png  |  Bin 0 -> 96762 bytes
 doc/guides/prog_guide/img/hier_sched_blk.png   |  Bin 0 -> 36328 bytes
 doc/guides/prog_guide/img/host_vm_comms.png|  Bin 0 -> 16487 bytes
 doc/guides/prog_guide/img/host_vm_comms_qemu.png   |  Bin 0 -> 15383 bytes
 doc/guides/prog_guide/img/inter_vm_comms.png   |  Bin 0 -> 370244 bytes
 doc/guides/prog_guide/img/ivshmem.png  |  Bin 0 -> 44920 bytes
 doc/guides/prog_guide/img/kernel_nic_intf.png  |  Bin 0 -> 185839 bytes
 doc/guides/prog_guide/img/kni_traffic_flow.png |  Bin 0 -> 366308 bytes
 doc/guides/prog_guide/img/link_bonding.png |  Bin 0 -> 223318 bytes
 doc/guides/prog_guide/img/linuxapp_launch.svg  |  762 ++
 doc/guides/prog_guide/img/m_definition.png |  Bin 0 -> 1261 bytes
 doc/guides/prog_guide/img/malloc_heap.png  |  Bin 0 -> 81329 bytes
 doc/guides/prog_guide/img/mbuf1.svg|  584 +
 doc/guides/prog_guide/img/mbuf2.svg| 1263 ++
 doc/guides/prog_guide/img/memory-management.svg| 2164 +
 doc/guides/prog_guide/img/memory-management2.svg   | 2301 ++
 doc/guides/prog_guide/img/mempool.svg  | 2434 
 doc/guides/prog_guide/img/multi_process_memory.svg |  102 +
 doc/guides/prog_guide/img/packet_distributor1.png  |  Bin 0 -> 99482 bytes
 doc/guides/prog_guide/img/packet_distributor2.png  |  Bin 0 -> 102867 bytes
 doc/guides/prog_guide/img/perf_benchmark.png   |  Bin 0 -> 392248 bytes
 doc/guides/prog_guide/img/pipe_prefetch_sm.png |  Bin 0 -> 71898 bytes
 doc/guides/prog_guide/img/pkt_drop_probability.png |  Bin 0 -> 46368 bytes
 doc/guides/prog_guide/img/pkt_flow_kni.png |  Bin 0 -> 51088 bytes
 .../prog_guide/img/pkt_proc_pipeline_qos.png   |  Bin 0 -> 93198 bytes
 doc/guides/prog_guide/img/prefetch_pipeline.png|  Bin 0 -> 56358 bytes
 doc/guides/prog_guide/img/ring-dequeue1.svg|  690 ++
 doc/guides/prog_guide/img/ring-dequeue2.svg|  653 ++
 doc/guides/prog_guide/img/ring-dequeue3.svg|

[dpdk-dev] Load-balancing position field in DPDK load_balancer sample app vs. Hash table

2014-11-17 Thread Chilikin, Andrey
Fortville can calculate hash for packets encapsulated into different tunnels: 
GRE/NVGRE, VXLAN, QinQ (S-Tag + C-Tag), but at the moment current version of 
DPDK supports only VXLAN.

Regards,
Andrey

-Original Message-
From: Yerden Zhumabekov [mailto:e_zhumabe...@sts.kz] 
Sent: Friday, November 14, 2014 5:21 PM
To: Chilikin, Andrey; Ananyev, Konstantin; Kamraan Nasim; dev at dpdk.org
Cc: Yuanzhang Hu; Zhang, Helin
Subject: Re: [dpdk-dev] Load-balancing position field in DPDK load_balancer 
sample app vs. Hash table

Thank you. And one more thing, does Fortville (or Niantic) support various L2 
headers when calculating RSS hash? I mean MPLS, QinQ, etc.?

14.11.2014 22:57, Chilikin, Andrey ?:
> Fortville supports symmetrical hashing on HW level, a patch for i40e PMD was 
> submitted a couple of weeks ago. For Niantic you can use symmetrical  rss key 
> recommended by Konstantin.
>
> Regards,
> Andrey
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, 
> Konstantin
> Sent: Friday, November 14, 2014 4:50 PM
> To: Yerden Zhumabekov; Kamraan Nasim; dev at dpdk.org
> Cc: Yuanzhang Hu
> Subject: Re: [dpdk-dev] Load-balancing position field in DPDK 
> load_balancer sample app vs. Hash table
>
>> -Original Message-
>> From: Yerden Zhumabekov [mailto:e_zhumabekov at sts.kz]
>> Sent: Friday, November 14, 2014 4:23 PM
>> To: Ananyev, Konstantin; Kamraan Nasim; dev at dpdk.org
>> Cc: Yuanzhang Hu
>> Subject: Re: [dpdk-dev] Load-balancing position field in DPDK 
>> load_balancer sample app vs. Hash table
>>
>> I'd like to interject a question here.
>>
>> In case of flow classification, one might possibly prefer for packets 
>> from the same flow to fall on the same logical core. With this '%'
>> load balancing, it would require to get the same RSS hash value for 
>> packets with direct (src to dst) and swapped (dst to src) IPs and 
>> ports. Am I correct that hardware RSS calculation cannot provide this 
>> symmetry?
> As I remember, it is possible but you have to tweak rss key values.
> Here is a paper describing how to do that:
> http://www.ndsl.kaist.edu/~shinae/papers/TR-symRSS.pdf
>
> Konstantin
>

--
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH v2 05/13] mbuf: remove too specific PKT_TX_OFFLOAD_MASK definition

2014-11-17 Thread Bruce Richardson
On Fri, Nov 14, 2014 at 06:03:21PM +0100, Olivier Matz wrote:
> This definition is specific to Intel PMD drivers and its definition
> "indicate what bits required for building TX context" shows that it
> should not be in the generic rte_mbuf.h but in the PMD driver.
> 
> Signed-off-by: Olivier Matz 

Acked-by: Bruce Richardson 

> ---
>  lib/librte_mbuf/rte_mbuf.h| 5 -
>  lib/librte_pmd_e1000/igb_rxtx.c   | 8 +++-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 8 +++-
>  3 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index d7070e9..68fb988 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -129,11 +129,6 @@ extern "C" {
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG   (1ULL << 63) /**< Mbuf contains control data */
>  
> -/**
> - * Bit Mask to indicate what bits required for building TX context
> - */
> -#define PKT_TX_OFFLOAD_MASK (PKT_TX_VLAN_PKT | PKT_TX_IP_CKSUM | 
> PKT_TX_L4_MASK)
> -
>  /* define a set of marker types that can be used to refer to set points in 
> the
>   * mbuf */
>  typedef void*MARKER[0];   /**< generic marker for a point in a structure 
> */
> diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
> index b406397..433c616 100644
> --- a/lib/librte_pmd_e1000/igb_rxtx.c
> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
> @@ -84,6 +84,12 @@
>   ETH_RSS_IPV6_UDP | \
>   ETH_RSS_IPV6_UDP_EX)
>  
> +/* Bit Mask to indicate what bits required for building TX context */
> +#define IGB_TX_OFFLOAD_MASK ( \
> + PKT_TX_VLAN_PKT |\
> + PKT_TX_IP_CKSUM |\
> + PKT_TX_L4_MASK)
> +
>  static inline struct rte_mbuf *
>  rte_rxmbuf_alloc(struct rte_mempool *mp)
>  {
> @@ -400,7 +406,7 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf 
> **tx_pkts,
>   ol_flags = tx_pkt->ol_flags;
>   vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
>   vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> - tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
> + tx_ol_req = ol_flags & IGB_TX_OFFLOAD_MASK;
>  
>   /* If a Context Descriptor need be built . */
>   if (tx_ol_req) {
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index 7e470ce..ca35db2 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -90,6 +90,12 @@
>   ETH_RSS_IPV6_UDP | \
>   ETH_RSS_IPV6_UDP_EX)
>  
> +/* Bit Mask to indicate what bits required for building TX context */
> +#define IXGBE_TX_OFFLOAD_MASK (   \
> + PKT_TX_VLAN_PKT |\
> + PKT_TX_IP_CKSUM |\
> + PKT_TX_L4_MASK)
> +
>  static inline struct rte_mbuf *
>  rte_rxmbuf_alloc(struct rte_mempool *mp)
>  {
> @@ -580,7 +586,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>   ol_flags = tx_pkt->ol_flags;
>  
>   /* If hardware offload required */
> - tx_ol_req = ol_flags & PKT_TX_OFFLOAD_MASK;
> + tx_ol_req = ol_flags & IXGBE_TX_OFFLOAD_MASK;
>   if (tx_ol_req) {
>   vlan_macip_lens.f.vlan_tci = tx_pkt->vlan_tci;
>   vlan_macip_lens.f.l2_l3_len = tx_pkt->l2_l3_len;
> -- 
> 2.1.0
> 


[dpdk-dev] [PATCH 2/4] doc: Corrected info for tx_checksum set mask function, in testpmd UG

2014-11-17 Thread Olivier MATZ
Hi Pablo,

On 11/15/2014 08:13 PM, Pablo de Lara wrote:
> tx_checksum set mask function now allows 4 extra bits in the mask
> for TX checksum offload
> 
> Signed-off-by: Pablo de Lara 
> ---
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |   10 +-
>  1 files changed, 9 insertions(+), 1 deletions(-)

A patch reworking the csumonly API is pending:
http://dpdk.org/ml/archives/dev/2014-November/008188.html

I don't know if it will be accepted, but just to mention that
these 2 patches will conflict in this case.

Regards,
Olivier


[dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag

2014-11-17 Thread Bruce Richardson
On Fri, Nov 14, 2014 at 06:03:22PM +0100, Olivier Matz wrote:
> In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
> The issue is that the list of flags in the application has to be
> synchronized with the flags defined in rte_mbuf.h.
> 
> This patch introduces 2 new functions rte_get_rx_ol_flag_name()
> and rte_get_tx_ol_flag_name() that returns the name of a flag from
> its mask. It also fixes rxonly.c to use this new functions and to
> display the proper flags.
> 
> Signed-off-by: Olivier Matz 
> ---
>  app/test-pmd/rxonly.c  | 36 ++--
>  lib/librte_mbuf/rte_mbuf.c | 45 +
>  lib/librte_mbuf/rte_mbuf.h | 22 ++
>  3 files changed, 77 insertions(+), 26 deletions(-)
> 
> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
> index 9ad1df6..51a530a 100644
> --- a/app/test-pmd/rxonly.c
> +++ b/app/test-pmd/rxonly.c
> @@ -71,26 +71,6 @@
>  
>  #include "testpmd.h"
>  
> -#define MAX_PKT_RX_FLAGS 13
> -static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = {
> - "VLAN_PKT",
> - "RSS_HASH",
> - "PKT_RX_FDIR",
> - "IP_CKSUM",
> - "IP_CKSUM_BAD",
> -
> - "IPV4_HDR",
> - "IPV4_HDR_EXT",
> - "IPV6_HDR",
> - "IPV6_HDR_EXT",
> -
> - "IEEE1588_PTP",
> - "IEEE1588_TMST",
> -
> - "TUNNEL_IPV4_HDR",
> - "TUNNEL_IPV6_HDR",
> -};
> -
>  static inline void
>  print_ether_addr(const char *what, struct ether_addr *eth_addr)
>  {
> @@ -214,12 +194,16 @@ pkt_burst_receive(struct fwd_stream *fs)
>   printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue);
>   printf("\n");
>   if (ol_flags != 0) {
> - int rxf;
> -
> - for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) {
> - if (ol_flags & (1 << rxf))
> - printf("  PKT_RX_%s\n",
> -pkt_rx_flag_names[rxf]);
> + unsigned rxf;
> + const char *name;
> +
> + for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) {
> + if ((ol_flags & (1ULL << rxf)) == 0)
> + continue;
> + name = rte_get_rx_ol_flag_name(1ULL << rxf);
> + if (name == NULL)
> + continue;
> + printf("  %s\n", name);
>   }
>   }
>   rte_pktmbuf_free(mb);
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 52e7574..5cd9137 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -196,3 +196,48 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, 
> unsigned dump_len)
>   nb_segs --;
>   }
>  }
> +
> +/*
> + * Get the name of a RX offload flag
> + */
> +const char *rte_get_rx_ol_flag_name(uint64_t mask)
> +{
> + switch (mask) {
> + case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
> + case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
> + case PKT_RX_FDIR: return "PKT_RX_FDIR";
> + case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> + case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> + /* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
> + /* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> + /* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
> + /* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> + /* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
> + case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
> + case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
> + case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
> + case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
> + case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
> + case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
> + case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
> + case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
> + default: return NULL;
> + }
> +}
> +
> +/*
> + * Get the name of a TX offload flag
> + */
> +const char *rte_get_tx_ol_flag_name(uint64_t mask)
> +{
> + switch (mask) {
> + case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
> + case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
> + case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
> + case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
> + case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
> + case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
> + case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> + default: return NULL;
> + }
> +}
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 68fb988..e76617f 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte

[dpdk-dev] [PATCH 2/4] doc: Corrected info for tx_checksum set mask function, in testpmd UG

2014-11-17 Thread De Lara Guarch, Pablo
Hi Olivier,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ
> Sent: Monday, November 17, 2014 10:39 AM
> To: De Lara Guarch, Pablo; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/4] doc: Corrected info for tx_checksum set
> mask function, in testpmd UG
> 
> Hi Pablo,
> 
> On 11/15/2014 08:13 PM, Pablo de Lara wrote:
> > tx_checksum set mask function now allows 4 extra bits in the mask
> > for TX checksum offload
> >
> > Signed-off-by: Pablo de Lara 
> > ---
> >  doc/guides/testpmd_app_ug/testpmd_funcs.rst |   10 +-
> >  1 files changed, 9 insertions(+), 1 deletions(-)
> 
> A patch reworking the csumonly API is pending:
> http://dpdk.org/ml/archives/dev/2014-November/008188.html
> 
> I don't know if it will be accepted, but just to mention that
> these 2 patches will conflict in this case.

Thanks for spotting it! I guess that at this point, all we can do is wait.
If you patch gets applied before mine, I will send a v2 with the changes.
If it gets applied after, then I will send another patch to fix it.

Pablo
> 
> Regards,
> Olivier


[dpdk-dev] one lightwight rte_eal_init() for SECONDARY processes which only use sharedmemory

2014-11-17 Thread Bruce Richardson
On Sun, Nov 16, 2014 at 03:06:15AM +, Chi, Xiaobo (NSN - CN/Hangzhou) wrote:
> Hi, Bruce and Neil,
> Do you indicate that to add one program arg, such as "--mem-only", to let the 
> rte_eal_init()only to do those memory initializing things? I am worrying that 
> this would make rte_eal_init() more complex due to the "--mem-only" should be 
> judged here and there, even in some sub-functions. 
> 

Where, apart from rte_eal_init() and parse_args, would it need to be added? 
Unless
it's a very long list of functions that need to be changed, I would still expect
the flag solution to be easier to maintain than a new set of functions.

Regards,
/Bruce

> brgs,
> chi xiaobo
> 
> -Original Message-
> From: ext Neil Horman [mailto:nhorman at tuxdriver.com] 
> Sent: Friday, November 14, 2014 9:54 PM
> To: Bruce Richardson
> Cc: Chi, Xiaobo (NSN - CN/Hangzhou); dev at dpdk.org
> Subject: Re: [dpdk-dev] one lightwight rte_eal_init() for SECONDARY processes 
> which only use sharedmemory
> 
> On Fri, Nov 14, 2014 at 10:48:21AM +, Bruce Richardson wrote:
> > On Wed, Nov 12, 2014 at 03:22:37AM +, Chi, Xiaobo (NSN - CN/Hangzhou) 
> > wrote:
> > > Hi,
> > > Background:
> > > What we are doing now is port make telecom network element to be cloud 
> > > based.  For one of our product,  DPDK is applied not only for 
> > > fastpath/dataplane processing, but also for Distributed Message eXchange 
> > > (DMX) between different processes/applications which may located in 
> > > different VM even different host.  for such a DMX system, in one VM, we 
> > > have one DPDK based dmxdemo (which acts as the PRIMARY) which is in 
> > > charge of distribute message between different applications, and dozens 
> > > of applications (act as SECONDARY) to use DPDK based 
> > > rte_tx_ring/rte_rx_ring/mempool/memzone to send receive messages to 
> > > dmxdemo.
> > > 
> > > Problem:
> > > Here, these DPDK based SECONDARY processes need only the DPDK's hugepage 
> > > based sharememory mechanism and it's upper libs (such as ring, mempool, 
> > > etc.), they need not cpu core pinning, iopl privilege changing , pci 
> > > device, timer, alarm, interrupt, shared_driver_list,  core_info, threads 
> > > for each core, etc. Then, for such kind of SECONDARY processes, the 
> > > current rte_eal_init() is too heavy.
> > > I have seen some others also met similar troubles.
> > > 
> > > Solution:
> > > I write one light weight rte_eal_init(), called 
> > > rte_eal_secondary_mem_init() as following.  It only initializes shared 
> > > memory and mandatory resources. I expect your review and hope these code 
> > > can be merged into DPDK main branch.
> > >
> > 
> > Rather than writing a whole new API and arg parsing function, might it be 
> > better
> > to implement this just as an additional EAL flag that prevents the existing
> > eal_init function from doing all tasks?
> > 
> > /Bruce
> > 
> +1, that seems like a more compatible approach.
> Neil
> 
> > 
> > > static void eal_secondary_mem_parse_args(int argc, char **argv)
> > > {
> > > static struct option lgopts[] = {
> > > {OPT_HUGE_DIR, 1, 0, 0},
> > > {OPT_FILE_PREFIX, 1, 0, 0},
> > > {0, 0, 0, 0}
> > > };
> > > 
> > > int opt;
> > > int option_index;
> > > 
> > > while ((opt = getopt_long(argc, argv, "", lgopts, &option_index)) 
> > > != EOF) {
> > > 
> > > if (!opt ) {
> > >  if (!strcmp(lgopts[option_index].name, OPT_HUGE_DIR)) {
> > > internal_config.hugepage_dir = optarg;
> > >  }
> > >  else if (!strcmp(lgopts[option_index].name, OPT_FILE_PREFIX)) {
> > > internal_config.hugefile_prefix = optarg;
> > >  }
> > >   }
> > >}
> > > }
> > > 
> > > int rte_eal_secondary_mem_init( int argc, char **argv )
> > > {
> > > static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0);
> > > const char *logid;
> > > 
> > > if (!rte_atomic32_test_and_set(&run_once))
> > > return -1;
> > > 
> > > logid = strrchr(argv[0], '/');
> > > logid = strdup(logid ? logid + 1: argv[0]);
> > > 
> > > if (rte_eal_log_early_init() < 0)
> > > rte_panic("Cannot init early logs\n");
> > > 
> > >memset( &internal_config, 0, sizeof( struct internal_config ) );
> > >/*this is only for secondary PRBs */
> > >internal_config.process_type = RTE_PROC_SECONDARY;
> > > internal_config.hugefile_prefix = HUGEFILE_PREFIX_DEFAULT;
> > > internal_config.hugepage_dir = NULL;
> > >/* user can freely define the hugefile_prefix and hugepage_dir */
> > >eal_secondary_mem_parse_args( argc, argv );
> > > 
> > >RTE_LOG(INFO, EAL, "prefix=%s, 
> > > dir=%s.\n",internal_config.hugefile_prefix, internal_config.hugepage_dir 
> > > );
> > > 
> > >/* To share memory config with PRIMARY process */
> > >internal_config.no_shconf = 0;
> > >rte_config_init();

[dpdk-dev] [PATCH v2] librte_pmd_packet: add PMD for AF_PACKET-based virtual devices

2014-11-17 Thread Neil Horman
On Thu, Nov 13, 2014 at 12:57:25PM +0100, Thomas Monjalon wrote:
> 2014-11-13 06:14, Neil Horman:
> > On Thu, Nov 13, 2014 at 02:03:18AM -0800, Thomas Monjalon wrote:
> > > 2014-10-08 15:14, Neil Horman:
> > > > On Wed, Oct 08, 2014 at 05:57:46PM +0200, Thomas Monjalon wrote:
> > > > > 2014-09-29 11:05, Bruce Richardson:
> > > > > > On Fri, Sep 26, 2014 at 10:08:55AM -0400, Neil Horman wrote:
> > > > > > > On Fri, Sep 26, 2014 at 11:28:05AM +0200, Thomas Monjalon wrote:
> > > > > > > > 3) There is no test associated with this PMD.
> > > > > > > That would have been a great comment to make a few months back, 
> > > > > > > though whats
> > > > > > > wrong with testpmd here?  That seems to be the same test that 
> > > > > > > every other pmd
> > > > > > > uses. What exactly are you looking for?
> > > > > 
> > > > > I was thinking of testing behaviour with different kernel 
> > > > > configurations and
> > > > > unit tests for --vdev options. But it's not a major blocker.
> > > > > 
> > > > Thats fine with me.  If theres a set of unit tests that you have 
> > > > documentation
> > > > for, I'm sure we would be happy to run them.  I presume you just want 
> > > > all the
> > > > pmd vdev option exercised?  Any specific sets of kernel configurations?
> > > 
> > > I don't really know which tests are needed. It could be a mix of unit 
> > > tests
> > > and functionnal tests described in a test plan.
> > > The goal is to be able to validate the behaviour and check there is no
> > > regression. Ideally some corner cases could be described.
> > > I'm OK to integrate it as is. But future maintenance will probably need
> > > such inputs for validation tests.
> > > 
Apologies for the delay on this, its been a busy time lately.

> > Do you have an example set of tests that the other pmd's have followed for 
> > this?
> 
> You can check this:
>   http://dpdk.org/browse/tools/dts/tree/test_plans/pmd_test_plan.rst
>   
> http://dpdk.org/browse/tools/dts/tree/test_plans/pmd_bonded_test_plan.rst
> 
Looking at this, the pmd_test_plan above seems perfectly applicable to Johns
pmd.  did you feel as though additional tests were needed for a virutal pmd
(asside from a note describing the additional --vdev parameter required for
virtual device setup?

I'll have a renamed device pmd patch up later today.

Neil

> As I said, we can integrate AF_PACKET PMD without such test plan.
> But we are going to improve testing of many areas in DPDK.
> 
> > > > > If RedHat is committed for its maintenance, it could integrated in 
> > > > > release 1.8.
> > > > > But I'd like it to be renamed as pmd_af_packet (or a better name) 
> > > > > instead of
> > > > > pmd_packet.
> > > > > 
> > > > John L. is on his way to plumbers at the moment, so is unable to 
> > > > comment, but
> > > > I'll try to get a few cycles to change the name of the PMD around.  And 
> > > > yes, I
> > > > thought that maintenance was implicit.  He's the author, of course 
> > > > he'll take
> > > > care of it :).  And I'll be glad to help
> > > 
> > > Do you have time in coming days to rebase and rename this PMD for 
> > > inclusion
> > > in 1.8.0 release?
> 
> Do you think a sub-tree with pull request model would help you for
> maintenance of this PMD?
> 
> -- 
> Thomas
> 


[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload

2014-11-17 Thread Olivier MATZ
Hi Jijiang,

On 11/17/2014 07:52 AM, Liu, Jijiang wrote:
> Anyway, I explain the checksum mechanism here again.
> 
> In my VXLAN patch set, for an VXLAN packet TX checksum offload,  the logics 
> below:
> 
> 1. only set outer L3/L4 header TX checksum
> tx_checksum set 0x3(0r 0x1) 0
>   In this case, the PKT_TX_VXLAN_CKSUM flag is not set as we don't set inner 
> flags(PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM), so we don't need to change inner 
> ones, driver think it is the ordinary packet,  
> HW will do outer L3/L4 checksum offload. 

Let's take an example with the following packet:
Ether / IP / UDP / VxLAN / Ether / IP / UDP / data

The original behavior (without your vxlan patches), which still
works today, is to select inner or outer using the m->l2_len field:

  - checksum outer IP + UDP
m->l2_len=14 m->l3_len=20
flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM

  - checksum inner IP + UDP
m->l2_len=64 m->l3_len=20
flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
of course, the packet is valid only if the outer IP checksum is
already correct and outer UDP checksum is 0

If i40e does not act like this, it does not follow the previous API.

> 2. only set inner L3/L4 header TX checksum
> tx_checksum set 0x30 0
>   In this case, the PKT_TX_VXLAN_CKSUM flag is set, so driver think it is 
> VXLAN packet, and we don't need to change outer ones because we don't set 
> outer flags here (PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM).

As explained above, there is no need to set the PKT_TX_VXLAN_CKSUM if
you only want to set the inner L3/L4 checksum. This was already working
like this before your patches, as long as l2_len and l3_len are set
properly in the mbuf (l2_len should include the outer headers).

Moreover, PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM, ... are not "outer flags".
They are hardware checksum flags, and before your vxland patch, they
concerned the headers referenced by m->l2_len and m->l3_len.

With your vxlan patch, it changed without beeing documented. These
flags use either (m->l2_len, m->l3_len) or (m->inner_l2_len,
m->inner_l3_len), which is not a good idea in my opinion.

> 3. set outer L3/L4 TX checksum and inner L3&L4 TX checksum 
> tx_checksum set 0x31(0x33) 0
> in this case, the PKT_TX_VXLAN_CKSUM flag is set, driver think it is VXLAN 
> packet, and we need to change outer and inner as both outer and inner flags 
> are set.

Here you are talking about test pmd flags. You do not describe the
mbuf API: PKT_TX_* flags and lengths values that need to be set (l2_len,
l3_len, ...) and to what they refer to.

I think if you want to explain the vxlan checksum offload mbuf API,
you should not talk about the testpmd flags as:
- they don't match the mbuf flags
- they have undocumented (or uncoherent) behavior in the csumonly
  forward engine

After several exchanges about this vxlan API.
Unfortunately, it is still vague and obscure to me.

So here is a proposition of API documentation that looks
understandable. Maybe it is easier to change the code to match this API:



PKT_TX_IP_CKSUM flag enables hardware computation of IP cksum. To
use it:
- fill l2_len and l3_len in mbuf
- set the flag PKT_TX_IP_CKSUM
- set the ip checksum to 0 in IP header
See (1) and (2).

One value among PKT_TX_L4_NO_CKSUM, PKT_TX_UDP_CKSUM,
PKT_TX_TCP_CKSUM and PKT_TX_SCTP_CKSUM can be assigned to the bits
of PKT_TX_L4_MASK. These flags are used to offload the L4 checksum in
hardware.
The user requires to:
- fill l2_len and l3_len in mbuf
- set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or
  PKT_TX_UDP_CKSUM
- calculate the pseudo header checksum and set it in the L4
  header (only for TCP or UDP). See rte_ipv4_phdr_cksum() and
  rte_ipv6_phdr_cksum().  For SCTP, set the crc field to 0.
See (1) and (2).

The PKT_TX_TCP_SEG flag can be set to enable TCP segmentation
offload for a packet to be transmitted on hardware supporting
TSO:
- set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag
  implies PKT_TX_TCP_CKSUM)
- if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP
  checksum to 0 in the packet
- fill the mbuf offload information: l2_len, l3_len, l4_len,
  tso_segsz
- calculate the pseudo header checksum without taking ip_len in
  accound, and set it in the TCP header. Refer to
  rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum() that can be
  used as helpers.
See (1) and (2).

(1) In case the packet is an encapsulated packet, the m->l2_len
field can include all the outer tunnel headers. These headers
will remain unmodified by the hardware.

(2) If outer_l2_len and outer_l3_len are not 0, the beginning of
the inner headers is relative to outer_l2_len + outer_l3_len.


[To replace the PKT_TX_VXLAN_CKSUM, we introduce 2 new flags]

PKT_TX_OUTER_IP_CKSUM flag enables hardware computation of IP cksum
in outer headers. To use it:
- fill outer_l2_len and outer_l3_len in mbuf
- set the flag PKT_TX_OUTER_IP_CKSUM
- set the ip checksum to 0 in outer IP header

One value among PKT_TX_OUTER_L4_NO_CKSUM, PKT_TX_OUTER_U

[dpdk-dev] [PATCH v2] librte_pmd_packet: add PMD for AF_PACKET-based virtual devices

2014-11-17 Thread Thomas Monjalon
2014-11-17 06:19, Neil Horman:
> On Thu, Nov 13, 2014 at 12:57:25PM +0100, Thomas Monjalon wrote:
> > 2014-11-13 06:14, Neil Horman:
> > > Do you have an example set of tests that the other pmd's have followed 
> > > for this?
> > 
> > You can check this:
> > http://dpdk.org/browse/tools/dts/tree/test_plans/pmd_test_plan.rst
> > 
> > http://dpdk.org/browse/tools/dts/tree/test_plans/pmd_bonded_test_plan.rst
> > 
> Looking at this, the pmd_test_plan above seems perfectly applicable to Johns
> pmd.  did you feel as though additional tests were needed for a virutal pmd
> (asside from a note describing the additional --vdev parameter required for
> virtual device setup?

It's maybe sufficient. I didn't dig enough. We'll see wether some people need
more for validation.

> I'll have a renamed device pmd patch up later today.

Excellent.

Thanks
-- 
Thomas


[dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent

2014-11-17 Thread Neil Horman
On Sun, Nov 16, 2014 at 11:59:16PM +0600, Yerden Zhumabekov wrote:
> This is a rework of my previous patches improving performance of 
> rte_hash_crc. In addition, this revision brings a fallback mechanism to 
> ensure that CRC32 hash is calculated regardless of hardware support from CPU 
> (i.e. SSE4.2 intrinsics).
> 
> Summary of changes:
> * added CRC32 software implementation, which is used as a fallback in case 
> SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
> * added rte_hash_crc_set_alg() function to control availability of SSE4.2.
> * added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
> * reworked rte_hash_crc() function which leverages both versions of CRC32 
> hash calculation functions with 4 and 8-byte operands.
> 
> Patches were tested on machines either with and without SSE4.2 support. 
> Software implementation seems to be about 15 times slower than SSE4.2-enabled 
> one. Of course, they return identical results.
> 
> Yerden Zhumabekov (4):
>   hash: add software CRC32 implementation
>   hash: add new rte_hash_crc_8byte call
>   hash: add fallback to software CRC32 implementation
>   hash: rte_hash_crc() slices data into 8-byte pieces
> 
>  lib/librte_hash/rte_hash_crc.h |  212 
> ++--
>  1 file changed, 202 insertions(+), 10 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> 
Functionally this all looks great, but I think you want to add a 5th patch to
the series in which you remove the ifdef SSE4.2 bits from test_hash_perf, since
this makes rte_hash_crc usable in all cases.  Not sure if you would rather just
ditch rte_hash_jhash alltogether, or make testing it a command line runtime
option

Neil



[dpdk-dev] [RFC PATCH] Enable uio_pci_generic support

2014-11-17 Thread Bruce Richardson
On Tue, Oct 28, 2014 at 01:48:02AM +0800, Danny Zhou wrote:
> Linux kernel provides UIO as well as VFIO mechanism to support writing user
> space device driver. Comparing to UIO which is available since 2.6.32 kernel,
> the VFIO is introduced into kernel since version 3.6.0 with better interrupt
> and memory protection (build on top of Intel VT-d technology) supports.
> Basically, UIO and VFIO do two common things below:
> 1) Map PCIe device's I/O memory space to user space driver
> 2) Support device interrupt notification mechanism allows user space
>driver/application is notified when device interrupt triggers.
> 
> To run an DPDK application and make use of VFIO, two in_kernel modules
> vfio and vfio-pci module must be loaded. But to use UIO, a DPDK kernel
> module igb_uio, which was there since DPDK is invented, must be loaded to
> attach to in_kernel uio module. As an possible solution to deprecate igb_uio, 
> this RFC patch leverages uio_pci_generic this in_kernel module to support
> DPDK user space PMD in a generic fashion (like how VFIO works today), to
> partially (DPDK KNI module rte_kni still sits in kernel) remove DPDK 
> dependency
> on GPL code igb_uio in kernel. Tests with uio_pci_generic shows performance
> remains same and LSC interrupts works as before.
> 
> Example to bind Network Ports to uio_pci_generic:
>   modprobe uio
>   modprobe uio_pci_generic
>   /* to bind device 08:00.0, to the uio_pci_generic driver */
>   ./tools/dpdk_nic_bind.py -b uio_pci_generic 08:00.0
> 
> Note: this RFC patch does not completely remove igb_uio support due to
> igb_uio supports creating maximum number of SR-IOV VFs (Virtual Functions)
> by using max_vfs kernel parameter on older kernels (kernel 3.7.x and below).
> Specifically, igb_uio explicitly calls pci_enable_sriov() to create VFs, while
> it is not invoked in either uio or uio_pci_generic. On kernel 3.8.x
> and above, user can use the standard sysfs to enable VFs. For examples:
> 
> #echo $num_vf_enabled > /sys/class/net/$dev/device/sriov_numvfs   //enable VFs
> #echo 0 > /sys/class/net/$dev/device/sriov_numvfs //disable 
> VFs
> 

Brilliant to see this patch set - this should allow us to run on just about
any kernel without out-of-tree drivers. If/when this gets merged in, I'd like 
to see
a follow-up patch to change the documentation to make the use of this in-tree
driver the default method of using DPDK rather igb_uio. (VFIO still has too
many limitations IMHO to make it the newbie patch).

Anyway, a few comments below.

/Bruce

> ---
>  lib/librte_eal/common/include/rte_pci.h|   1 +
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c   |  67 +--
>  lib/librte_eal/linuxapp/eal/eal_pci.c  |   2 +-
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 203 
> -
>  .../linuxapp/eal/include/exec-env/rte_interrupts.h |   1 +
>  tools/dpdk_nic_bind.py |   2 +-
>  6 files changed, 214 insertions(+), 62 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_pci.h 
> b/lib/librte_eal/common/include/rte_pci.h
> index 66ed793..71ca882 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -148,6 +148,7 @@ struct rte_pci_device {
>   struct rte_pci_id id;   /**< PCI ID. */
>   struct rte_pci_resource mem_resource[PCI_MAX_RESOURCE];   /**< PCI 
> Memory Resource */
>   struct rte_intr_handle intr_handle; /**< Interrupt handle */
> + char driver_name[BUFSIZ];   /**< driver name */
>   const struct rte_pci_driver *driver;/**< Associated driver */
>   uint16_t max_vfs;   /**< sriov enable if not zero */
>   int numa_node;  /**< NUMA node connection */
> diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
> b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> index dc2668a..f2da778 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> @@ -361,6 +361,53 @@ vfio_disable_msix(struct rte_intr_handle *intr_handle) {
>  }
>  #endif
>  
> +static int
> +uio_intr_enable(struct rte_intr_handle *intr_handle)
> +{
> + unsigned char command_high;
> +
> + /* use config file descriptor for uio_pci_generic */
> + if (pread(intr_handle->uio_cfg_fd, &command_high, 1, 5) != 1) {
> + RTE_LOG(ERR, EAL,
> + "Error reading interrupts status for fd %d\n",
> + intr_handle->uio_cfg_fd);
> + return -1;
> + }
> + /* disable interrupts */
> + command_high |= 0x4;
> + if (pwrite(intr_handle->uio_cfg_fd, &command_high, 1, 5) != 1) {
> + RTE_LOG(ERR, EAL,
> + "Error disabling interrupts for fd %d\n",
> + intr_handle->uio_cfg_fd);
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +static int

[dpdk-dev] [PATCH v2 0/4] rte_hash_crc reworked to be platform-independent

2014-11-17 Thread Yerden Zhumabekov

17.11.2014 17:31, Neil Horman ?:
> On Sun, Nov 16, 2014 at 11:59:16PM +0600, Yerden Zhumabekov wrote:
>> This is a rework of my previous patches improving performance of 
>> rte_hash_crc. In addition, this revision brings a fallback mechanism to 
>> ensure that CRC32 hash is calculated regardless of hardware support from CPU 
>> (i.e. SSE4.2 intrinsics).
>>
>> Summary of changes:
>> * added CRC32 software implementation, which is used as a fallback in case 
>> SSE4.2 is not available, or if SSE4.2 is intentionally disabled.
>> * added rte_hash_crc_set_alg() function to control availability of SSE4.2.
>> * added rte_hash_crc_8byte() function to calculate CRC32 on 8-byte operand.
>> * reworked rte_hash_crc() function which leverages both versions of CRC32 
>> hash calculation functions with 4 and 8-byte operands.
>>
>> Patches were tested on machines either with and without SSE4.2 support. 
>> Software implementation seems to be about 15 times slower than 
>> SSE4.2-enabled one. Of course, they return identical results.
>>
>> Yerden Zhumabekov (4):
>>   hash: add software CRC32 implementation
>>   hash: add new rte_hash_crc_8byte call
>>   hash: add fallback to software CRC32 implementation
>>   hash: rte_hash_crc() slices data into 8-byte pieces
>>
>>  lib/librte_hash/rte_hash_crc.h |  212 
>> ++--
>>  1 file changed, 202 insertions(+), 10 deletions(-)
>>
>> -- 
>> 1.7.9.5
>>
>>
> Functionally this all looks great, but I think you want to add a 5th patch to
> the series in which you remove the ifdef SSE4.2 bits from test_hash_perf, 
> since
> this makes rte_hash_crc usable in all cases.  Not sure if you would rather 
> just
> ditch rte_hash_jhash alltogether, or make testing it a command line runtime
> option

Meanwhile, I've borrowed some Intel's code (BSD licensed) for CRC32 sw
algorithm, it runs 4 times faster sacrificing memory (2K) for additional
lookup tables. I'd like to include it as well. As for test_hash_perf,
I'll look at it.
Should I just send new series over as 'v3'? Any approval/disapproval for
the current series?

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ




[dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation

2014-11-17 Thread Ananyev, Konstantin

Hi Yerden,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yerden Zhumabekov
> Sent: Sunday, November 16, 2014 5:59 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 
> implementation
> 
> Initially, SSE4.2 support is detected via CPUID instruction.
> 
> Added rte_hash_crc_set_alg() function to detect and set CRC32
> implementation if necessary. SSE4.2 is allowed by default. If it's
> not available, fall back to sw implementation.
> 
> Signed-off-by: Yerden Zhumabekov 
> ---
>  lib/librte_hash/rte_hash_crc.h |   60 
> ++--
>  1 file changed, 58 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> index 74e2d92..178b162 100644
> --- a/lib/librte_hash/rte_hash_crc.h
> +++ b/lib/librte_hash/rte_hash_crc.h
> @@ -45,7 +45,11 @@ extern "C" {
>  #endif
> 
>  #include 
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
>  #include 
> +#endif
> +#include 
> +#include 
> 
>  /* Lookup table for software implementation of CRC32C */
>  static const uint32_t crc32c_table[256] = {
> @@ -152,8 +156,42 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>   return init_val;
>  }
> 
> +enum crc32_alg_t {
> + CRC32_SW = 0,
> + CRC32_SSE42,
> + CRC32_AUTODETECT
> +};
> +
> +/* Default algorithm is left for autodetection,
> + * it is detected on first run of hash function
> + */
> +static enum crc32_alg_t crc32_alg = CRC32_AUTODETECT;
> +
> +/**
> + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> + * hash calculation.
> + *
> + * @param flag
> + *   unsigned integer flag
> + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> + */
> +static inline void
> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> +{
> + int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> + enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> +
> + if (alg == CRC32_SSE42)
> + crc32_alg = alg_supp;
> + else
> + crc32_alg = CRC32_SW;
> +}
> +

Wonder can we define that function with __attribute__((constructor))?
Then, I suppose we can remove CRC32_AUTODETECT, and remove:
if (unlikely(crc32_alg == CRC32_AUTODETECT))
   rte_hash_crc_set_alg(CRC32_SSE42);   
from rte_hash_crc_*byte().

Konstantin

>  /**
>   * Use single crc32 instruction to perform a hash on a 4 byte value.
> + * Fall back to software crc32 implementation in case SSE4.2 is
> + * not supported
>   *
>   * @param data
>   *   Data to perform hash on.
> @@ -165,11 +203,21 @@ crc32c_2words(uint64_t data, uint32_t init_val)
>  static inline uint32_t
>  rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  {
> - return _mm_crc32_u32(init_val, data);
> + if (unlikely(crc32_alg == CRC32_AUTODETECT))
> + rte_hash_crc_set_alg(CRC32_SSE42);
> +
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> + if (likely(crc32_alg == CRC32_SSE42))
> + return _mm_crc32_u32(init_val, data);
> +#endif
> +
> + return crc32c_1word(data, init_val);
>  }
> 
>  /**
>   * Use single crc32 instruction to perform a hash on a 8 byte value.
> + * Fall back to software crc32 implementation in case SSE4.2 is
> + * not supported
>   *
>   * @param data
>   *   Data to perform hash on.
> @@ -181,7 +229,15 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
>  static inline uint32_t
>  rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
>  {
> - return _mm_crc32_u64(init_val, data);
> + if (unlikely(crc32_alg == CRC32_AUTODETECT))
> + rte_hash_crc_set_alg(CRC32_SSE42);
> +
> +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> + if (likely(crc32_alg == CRC32_SSE42))
> + return _mm_crc32_u64(init_val, data);
> +#endif
> +
> + return crc32c_2words(data, init_val);
>  }
> 
>  /**
> --
> 1.7.9.5



[dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation

2014-11-17 Thread Yerden Zhumabekov

17.11.2014 18:34, Ananyev, Konstantin ?:
> Hi Yerden,
>
>> +static inline void
>> +rte_hash_crc_set_alg(enum crc32_alg_t alg)
>> +{
>> +int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
>> +enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
>> +
>> +if (alg == CRC32_SSE42)
>> +crc32_alg = alg_supp;
>> +else
>> +crc32_alg = CRC32_SW;
>> +}
>> +
> Wonder can we define that function with __attribute__((constructor))?
> Then, I suppose we can remove CRC32_AUTODETECT, and remove:
> if (unlikely(crc32_alg == CRC32_AUTODETECT))
>rte_hash_crc_set_alg(CRC32_SSE42);   
> from rte_hash_crc_*byte().
Nice feature  I was unfamiliar with :)
Since I'm going to revise the patch series anyway, I'll apply it and
test. Thank you.

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ



[dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag

2014-11-17 Thread Olivier MATZ
Hi Bruce,

On 11/17/2014 11:39 AM, Bruce Richardson wrote:
>> +/*
>> + * Get the name of a TX offload flag
>> + */
>> +const char *rte_get_tx_ol_flag_name(uint64_t mask)
>> +{
>> +switch (mask) {
>> +case PKT_TX_VLAN_PKT: return "PKT_TX_VLAN_PKT";
>> +case PKT_TX_IP_CKSUM: return "PKT_TX_IP_CKSUM";
>> +case PKT_TX_TCP_CKSUM: return "PKT_TX_TCP_CKSUM";
>> +case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
>> +case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
>> +case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
>> +case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
>> +default: return NULL;
>> +}
>> +}
>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>> index 68fb988..e76617f 100644
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -129,6 +129,28 @@ extern "C" {
>>  /* Use final bit of flags to indicate a control mbuf */
>>  #define CTRL_MBUF_FLAG   (1ULL << 63) /**< Mbuf contains control data */
>>  
> 
> I think this patch should perhaps also add to a comment at the top of the list
> of flags that any new flags added should also be added to the appropriate
> function in rte_mbuf.c. Having the comment in rte_mbuf.h where people would 
> add the flags
> should help remind people to keep the flag lists in sync.

Good idea, I'll add it.

Regards,
Olivier


[dpdk-dev] [PATCH v2 08/13] testpmd: rework csum forward engine

2014-11-17 Thread Olivier MATZ
Hi Jijiang,

On 11/17/2014 09:11 AM, Liu, Jijiang wrote:
>> +/* Calculate the checksum of outer header (only vxlan is supported,
>> + * meaning IP + UDP). The caller already checked that it's a vxlan
>> + * packet */
>> +static uint64_t
>> +process_outer_cksums(void *outer_l3_hdr, uint16_t outer_ethertype,
>> +uint16_t outer_l3_len, uint16_t testpmd_ol_flags) {
>> +struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
>> +struct ipv6_hdr *ipv6_hdr = outer_l3_hdr;
>> +struct udp_hdr *udp_hdr;
>> +uint64_t ol_flags = 0;
>> +
>> +if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
>> +ol_flags |= PKT_TX_VXLAN_CKSUM;
>> +
>> +if (outer_ethertype == _htons(ETHER_TYPE_IPv4)) {
>> +ipv4_hdr->hdr_checksum = 0;
>> +
>> +if ((testpmd_ol_flags & TESTPMD_TX_OFFLOAD_VXLAN_CKSUM)
>> == 0)
>> +ipv4_hdr->hdr_checksum = get_ipv4_cksum(ipv4_hdr);
>> +}
> 
> As I mentioned, we should use TESTPMD_TX_OFFLOAD_IP_CKSUM instead of using 
> TESTPMD_TX_OFFLOAD_VXLAN_CKSUM flag to check if we need to set outer IP 
> checksum offload.
> In other words, even if VXLAN packet, outer IP TX checksum offload is also 
> needed if  TESTPMD_TX_OFFLOAD_IP_CKSUM is set.

The csum forward engine works as follow after the rework. I can add
some more comments in the patch or testpmd help to describe it more
clearly.

Receive a burst of packets, and for each packet:
 - parse packet, and try to recognize a supported packet type (1)
 - if it's not a supported packet type, don't touch the packet, else:
 - modify the IPs in inner headers and in outer headers if any
 - reprocess the checksum of all supported layers. This is done in SW
   or HW, depending on testpmd command line configuration
 - if TSO is enabled in testpmd command line, also flag the mbuf for TCP
   segmentation offload (this implies HW TCP checksum)
Then transmit packets on the output port.

(1) Supported packets are:
  Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
  Ether / (vlan) / outer IP|IP6 / outer UDP / VxLAN / Ether / IP|IP6 /
UDP|TCP|SCTP

The testpmd command line for csum takes the following arguments:
  tx_cksum set (ip|udp|tcp|sctp|vxlan) (hw|sw) (port_id)

- "ip|udp|tcp|sctp" always concern the inner layer
- "vxlan" concerns the outer IP and UDP layer (if packet is recognized
  as a vxlan packet)

Hope it's enough precisely described to be able to predict the output
of testpmd without reading the code or the i40e datasheets. This was
not so clear before.

So, following this description, there is not reason to check the
TESTPMD_TX_OFFLOAD_IP_CKSUM when scheduling the hardware VxLAN checksum.
One thing may be wrong however, it's the mbuf flags set in the packet.
But we cannot say it's wrong today because the API is not documented.
But the VXLAN feature is not enough documented to be sure it's wrong.


Regards,
Olivier


[dpdk-dev] [PATCH v6 0/8] support of multiple sizes of redirection table

2014-11-17 Thread Ananyev, Konstantin


> -Original Message-
> From: Zhang, Helin
> Sent: Saturday, November 15, 2014 4:04 PM
> To: dev at dpdk.org
> Cc: Cao, Waterman; Cao, Min; Wu, Jingjing; Liu, Jijiang; Chen, Erlu; Ananyev, 
> Konstantin; Zhang, Helin
> Subject: [PATCH v6 0/8] support of multiple sizes of redirection table
> 
> As e1000, ixgbe and i40e hardware use different sizes of redirection
> table in PF or VF, ethdev and PMDs need to be reworked to support
> multiple sizes of that table. In addition, commands in testpmd also
> need to be reworked to support these changes.
> 
> v2 changes:
> * Reorganized the patches.
> * Added code style fixes.
> * Added support of reta updating/querying in i40e VF.
> 
> v3 changes:
> * Reorganized the patch set.
> * Added returning default RX/TX configurations in VF (igb/ixgbe/i40e),
>   as the patch set of it for PF has been accepted recently.
> 
> v4 changes:
> * Renamed RTE_BIT_WIDTH_64 to RTE_RETA_GROUP_SIZE.
> * Added more comments to rte_eth_dev_rss_reta_update() and
>   rte_eth_dev_rss_reta_query().
> 
> v5 changes:
> * Reworked the annotations of macros of RETA sizes in rte_ethdev.h.
> 
> v6 changes:
> * Checking if the input number of reta size is 64 aligned has been
>   added in rte_ethdev.c.
> * Use macros to replace numeric in all igb, ixgbe and i40e PMDs of
>   updating/querying reta.
> 
> Helin Zhang (8):
>   app/testpmd: code style fix
>   i40evf: code style fix
>   i40e: support of setting hash lookup table size
>   igb: implement ops of 'dev_infos_get' for PF and VF respectively
>   ixgbe: implement ops of 'dev_infos_get' for PF and VF respectively
>   i40e: rework of ops of 'dev_infos_get' for both PF and VF
>   ethdev: support of multiple sizes of redirection table
>   i40evf: support of updating/querying redirection table
> 
>  app/test-pmd/cmdline.c   | 166 
>  app/test-pmd/config.c|  37 +++
>  app/test-pmd/testpmd.h   |   4 +-
>  lib/librte_ether/rte_ethdev.c| 121 
>  lib/librte_ether/rte_ethdev.h|  51 ++---
>  lib/librte_pmd_e1000/igb_ethdev.c| 179 +++---
>  lib/librte_pmd_i40e/i40e_ethdev.c| 122 +++-
>  lib/librte_pmd_i40e/i40e_ethdev.h|  25 -
>  lib/librte_pmd_i40e/i40e_ethdev_vf.c | 124 -
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c  | 208 
> +++
>  10 files changed, 719 insertions(+), 318 deletions(-)
> 
> --
> 1.8.1.4

Acked-by: Konstantin Ananyev 



[dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 implementation

2014-11-17 Thread Neil Horman
On Mon, Nov 17, 2014 at 12:34:04PM +, Ananyev, Konstantin wrote:
> 
> Hi Yerden,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yerden Zhumabekov
> > Sent: Sunday, November 16, 2014 5:59 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v2 3/4] hash: add fallback to software CRC32 
> > implementation
> > 
> > Initially, SSE4.2 support is detected via CPUID instruction.
> > 
> > Added rte_hash_crc_set_alg() function to detect and set CRC32
> > implementation if necessary. SSE4.2 is allowed by default. If it's
> > not available, fall back to sw implementation.
> > 
> > Signed-off-by: Yerden Zhumabekov 
> > ---
> >  lib/librte_hash/rte_hash_crc.h |   60 
> > ++--
> >  1 file changed, 58 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/librte_hash/rte_hash_crc.h b/lib/librte_hash/rte_hash_crc.h
> > index 74e2d92..178b162 100644
> > --- a/lib/librte_hash/rte_hash_crc.h
> > +++ b/lib/librte_hash/rte_hash_crc.h
> > @@ -45,7 +45,11 @@ extern "C" {
> >  #endif
> > 
> >  #include 
> > +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> >  #include 
> > +#endif
> > +#include 
> > +#include 
> > 
> >  /* Lookup table for software implementation of CRC32C */
> >  static const uint32_t crc32c_table[256] = {
> > @@ -152,8 +156,42 @@ crc32c_2words(uint64_t data, uint32_t init_val)
> > return init_val;
> >  }
> > 
> > +enum crc32_alg_t {
> > +   CRC32_SW = 0,
> > +   CRC32_SSE42,
> > +   CRC32_AUTODETECT
> > +};
> > +
> > +/* Default algorithm is left for autodetection,
> > + * it is detected on first run of hash function
> > + */
> > +static enum crc32_alg_t crc32_alg = CRC32_AUTODETECT;
> > +
> > +/**
> > + * Allow or disallow use of SSE4.2 instrinsics for CRC32 hash
> > + * hash calculation.
> > + *
> > + * @param flag
> > + *   unsigned integer flag
> > + *   - (CRC32_SW) Don't use SSE4.2 intrinsics
> > + *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available, set by default
> > + */
> > +static inline void
> > +rte_hash_crc_set_alg(enum crc32_alg_t alg)
> > +{
> > +   int sse42_supp = rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2);
> > +   enum crc32_alg_t alg_supp = sse42_supp ? CRC32_SSE42 : CRC32_SW;
> > +
> > +   if (alg == CRC32_SSE42)
> > +   crc32_alg = alg_supp;
> > +   else
> > +   crc32_alg = CRC32_SW;
> > +}
> > +
> 
> Wonder can we define that function with __attribute__((constructor))?
> Then, I suppose we can remove CRC32_AUTODETECT, and remove:
> if (unlikely(crc32_alg == CRC32_AUTODETECT))
>rte_hash_crc_set_alg(CRC32_SSE42);   
> from rte_hash_crc_*byte().
> 
That would make sense.
Neil



[dpdk-dev] [PATCH] eal: allow virtual devices to be white/black listed

2014-11-17 Thread Neil Horman
On Sun, Nov 16, 2014 at 09:26:55PM +, Nicol?s Pernas Maradei wrote:
> From: Nicol?s Pernas Maradei 
> 
> Virtual and physical devices are now treated the same in terms of
> white/black listing. Virtual devices can be defined using --vdev as
> before and also whitelisted (using -w vdev_name) or blacklisted (using -b
> vdev_name). This allows the user to have only a virtual device (port) in
> use.
> 
> Signed-off-by: Nicol?s Pernas Maradei 
> ---
>  app/test-pmd/cmdline.c  |   5 +-
>  app/test/commands.c |   4 +-
>  app/test/test_devargs.c |  69 +-
>  app/test/test_pci.c |   3 +-
>  lib/librte_eal/bsdapp/eal/eal.c |   8 +-
>  lib/librte_eal/bsdapp/eal/eal_pci.c |   2 +-
>  lib/librte_eal/common/eal_common_dev.c  |  24 ++--
>  lib/librte_eal/common/eal_common_devargs.c  | 196 
> 
>  lib/librte_eal/common/eal_common_options.c  |  11 +-
>  lib/librte_eal/common/eal_common_pci.c  |  11 +-
>  lib/librte_eal/common/include/rte_devargs.h | 124 +-
>  lib/librte_eal/linuxapp/eal/eal.c   |   8 +-
>  lib/librte_eal/linuxapp/eal/eal_pci.c   |   2 +-
>  13 files changed, 319 insertions(+), 148 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 4c3fc76..f6ba4fe 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -6844,6 +6844,8 @@ static void cmd_dump_parsed(void *parsed_result,
>   rte_mempool_list_dump(stdout);
>   else if (!strcmp(res->dump, "dump_devargs"))
>   rte_eal_devargs_dump(stdout);
> + else if (!strcmp(res->dump, "dump_vdevargs"))
> + rte_eal_vdevargs_dump(stdout);
>  }
>  
>  cmdline_parse_token_string_t cmd_dump_dump =
> @@ -6854,7 +6856,8 @@ cmdline_parse_token_string_t cmd_dump_dump =
>   "dump_struct_sizes#"
>   "dump_ring#"
>   "dump_mempool#"
> - "dump_devargs");
> + "dump_devargs#"
> + "dump_vdevargs");
>  
>  cmdline_parse_inst_t cmd_dump = {
>   .f = cmd_dump_parsed,  /* function to call */
> diff --git a/app/test/commands.c b/app/test/commands.c
> index 92a17ed..6799d63 100644
> --- a/app/test/commands.c
> +++ b/app/test/commands.c
> @@ -161,13 +161,15 @@ static void cmd_dump_parsed(void *parsed_result,
>   rte_mempool_list_dump(stdout);
>   else if (!strcmp(res->dump, "dump_devargs"))
>   rte_eal_devargs_dump(stdout);
> + else if (!strcmp(res->dump, "dump_vdevargs"))
> + rte_eal_vdevargs_dump(stdout);
>  }
>  
>  cmdline_parse_token_string_t cmd_dump_dump =
>   TOKEN_STRING_INITIALIZER(struct cmd_dump_result, dump,
>"dump_physmem#dump_memzone#dump_log_history#"
>"dump_struct_sizes#dump_ring#dump_mempool#"
> -  "dump_devargs");
> +  "dump_devargs#dump_vdevargs");
>  
>  cmdline_parse_inst_t cmd_dump = {
>   .f = cmd_dump_parsed,  /* function to call */
> diff --git a/app/test/test_devargs.c b/app/test/test_devargs.c
> index f0acf8e..8fb93df 100644
> --- a/app/test/test_devargs.c
> +++ b/app/test/test_devargs.c
> @@ -52,52 +52,60 @@ static void free_devargs_list(void)
>   }
>  }
>  
> +/* clear vdevargs list that was modified by the test */
> +static void free_vdevargs_list(void)
> +{
> + struct rte_vdevargs *vdevargs;
> +
> + while (!TAILQ_EMPTY(&vdevargs_list)) {
> + vdevargs = TAILQ_FIRST(&vdevargs_list);
> + TAILQ_REMOVE(&vdevargs_list, vdevargs, next);
> + free(vdevargs);
> + }
> +}
> +
>  static int
>  test_devargs(void)
>  {
>   struct rte_devargs_list save_devargs_list;
> + struct rte_vdevargs_list save_vdevargs_list;
>   struct rte_devargs *devargs;
> + struct rte_vdevargs *vdevargs;
>  
>   /* save the real devargs_list, it is restored at the end of the test */
>   save_devargs_list = devargs_list;
> + save_vdevargs_list = vdevargs_list;
>   TAILQ_INIT(&devargs_list);
> + TAILQ_INIT(&vdevargs_list);
>  
>   /* test valid cases */
> - if (rte_eal_devargs_add(RTE_DEVTYPE_WHITELISTED_PCI, "08:00.1") < 0)
> - goto fail;
> - if (rte_eal_devargs_add(RTE_DEVTYPE_WHITELISTED_PCI, ":5:00.0") < 0)
> - goto fail;
> - if (rte_eal_devargs_add(RTE_DEVTYPE_BLACKLISTED_PCI, "04:00.0,arg=val") 
> < 0)
> - goto fail;
> - if (rte_eal_devargs_add(RTE_DEVTYPE_BLACKLISTED_PCI, ":01:00.1") < 
> 0)
> + if (rte_eal_devargs_add(RTE_DEV_WHITELISTED, "08:00.1") < 0)
>   goto fail;
> - if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) != 2)
> + if (rte_eal_devargs_add(RTE_DEV_WHITELISTED, ":5:00.0") < 0)
>   goto fail;
> - if (rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI) != 2)
> + 

[dpdk-dev] [PATCH] llib/ibrte_net: workaround to avoid macro conflict

2014-11-17 Thread Neil Horman
On Mon, Nov 03, 2014 at 08:41:53AM +0100, Thomas Monjalon wrote:
> 2014-10-09 07:29, Neil Horman:
> > On Thu, Oct 09, 2014 at 05:20:31AM +, Wu, Jingjing wrote:
> > > Hi, Neil
> > > 
> > > To have rte_ip.h include netinet/in.h directly is also a choice.
> > > 
> > > But netinet/in.h contains a lot of extra stuff, and these may be useless 
> > > some DPDK applications, such as classification.
> > > rte_ip.h provides a more simplify way for the IP protocol layer.
> > > 
> > Not sure what the relevance there is.  The definitions you want are
> > standardized, theres no need for the dpdk to re-invent that wheel.  Get them
> > from the system include file.  The fact that extra macros are available in
> > netinet.h is neither relevant or true (as you can't really say for certain 
> > what
> > an application will need).
> 
> Neil, Matthew,
> 
> I totally agree with your point of view.
> Please, could you propose a patch to fix this issue?
> 
Matthew, can you handle this please, I've got too much going on right now.
Neil

> Thanks
> -- 
> Thomas
> 


[dpdk-dev] [PATCH] eal: allow virtual devices to be white/black listed

2014-11-17 Thread Nicolas Pernas Maradei

On 17/11/14 14:10, Neil Horman wrote:
> What are the above pci bus/device/function tuples?
>
> Neil

Hi,

I'm not sure if I understand the question. If you mean what kind of 
devices those are, as in physical or virtual, they are physical. 
rte_eal_devargs_add() would parse the string as a PCI ID. If it succeed 
it'd add them as physical devices having the field is_vdev = 0.

Please let me know if I answered your question.

Thanks,
Nico.


[dpdk-dev] [PATCH v3] librte_pmd_af_packet: add PMD for AF_PACKET-based virtual devices

2014-11-17 Thread John W. Linville
This is a Linux-specific virtual PMD driver backed by an AF_PACKET
socket.  This implementation uses mmap'ed ring buffers to limit copying
and user/kernel transitions.  The PACKET_FANOUT_HASH behavior of
AF_PACKET is used for frame reception.  In the current implementation,
Tx and Rx queues are always paired, and therefore are always equal
in number -- changing this would be a Simple Matter Of Programming.

Interfaces of this type are created with a command line option like
"--vdev=eth_af_packet0,iface=...".  There are a number of options availabe
as arguments:

 - Interface is chosen by "iface" (required)
 - Number of queue pairs set by "qpairs" (optional, default: 1)
 - AF_PACKET MMAP block size set by "blocksz" (optional, default: 4096)
 - AF_PACKET MMAP frame size set by "framesz" (optional, default: 2048)
 - AF_PACKET MMAP frame count set by "framecnt" (optional, default: 512)

Signed-off-by: John W. Linville 
---
This PMD is intended to provide a means for using DPDK on a broad
range of hardware without hardware-specific PMDs and (hopefully)
with better performance than what PCAP offers in Linux.  This might
be useful as a development platform for DPDK applications when
DPDK-supported hardware is expensive or unavailable.

New in v3:

-- Adopt awkward renaming (pmd_packet -> pmd_af_packet) in hopes of
   getting something merged...

New in v2:

-- fixup some style issues found by check patch
-- use if_index as part of fanout group ID
-- set default number of queue pairs to 1

 config/common_bsdapp |   5 +
 config/common_linuxapp   |   5 +
 lib/Makefile |   1 +
 lib/librte_eal/linuxapp/eal/Makefile |   1 +
 lib/librte_pmd_af_packet/Makefile|  60 ++
 lib/librte_pmd_af_packet/rte_eth_af_packet.c | 826 +++
 lib/librte_pmd_af_packet/rte_eth_af_packet.h |  53 ++
 mk/rte.app.mk|   4 +
 8 files changed, 955 insertions(+)
 create mode 100644 lib/librte_pmd_af_packet/Makefile
 create mode 100644 lib/librte_pmd_af_packet/rte_eth_af_packet.c
 create mode 100644 lib/librte_pmd_af_packet/rte_eth_af_packet.h

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 57cad76ed918..cc56d4ee9d48 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -224,6 +224,11 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=y
 CONFIG_RTE_LIBRTE_PMD_BOND=y

 #
+# Compile software PMD backed by AF_PACKET sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
+
+#
 # Do prefetch of packet data within PMD driver receive function
 #
 CONFIG_RTE_PMD_PACKET_PREFETCH=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 57b61c99be79..275bb630ef0f 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -247,6 +247,11 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=n
 CONFIG_RTE_LIBRTE_PMD_BOND=y

 #
+# Compile software PMD backed by AF_PACKET sockets (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+
+#
 # Compile Xen PMD
 #
 CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
diff --git a/lib/Makefile b/lib/Makefile
index e3237ff5fc30..204ef11d2e68 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -47,6 +47,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += librte_pmd_i40e
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += librte_pmd_ring
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += librte_pmd_pcap
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += librte_pmd_af_packet
 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index c99433eb276f..06c1dc51505e 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -44,6 +44,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_ether
 CFLAGS += -I$(RTE_SDK)/lib/librte_ivshmem
 CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_ring
 CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_pcap
+CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_af_packet
 CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_xenvirt
 CFLAGS += $(WERROR_FLAGS) -O3

diff --git a/lib/librte_pmd_af_packet/Makefile 
b/lib/librte_pmd_af_packet/Makefile
new file mode 100644
index ..6955e5c24a53
--- /dev/null
+++ b/lib/librte_pmd_af_packet/Makefile
@@ -0,0 +1,60 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2014 John W. Linville 
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   Copyright(c) 2014 6WIND S.A.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and 

[dpdk-dev] [PATCH] eal: allow virtual devices to be white/black listed

2014-11-17 Thread Neil Horman
On Mon, Nov 17, 2014 at 02:52:18PM +, Nicolas Pernas Maradei wrote:
> 
> On 17/11/14 14:10, Neil Horman wrote:
> >What are the above pci bus/device/function tuples?
> Hi,
> 
> I'm not sure if I understand the question. If you mean what kind of devices
> those are, as in physical or virtual, they are physical.
> rte_eal_devargs_add() would parse the string as a PCI ID. If it succeed it'd
> add them as physical devices having the field is_vdev = 0.
> 
> Please let me know if I answered your question.
> 
I get that, I was more asking, why those values?  They seem a bit magic to me,
and might benefit from some descriptive macros or comments so they make more
sense
Neil

> Thanks,
> Nico.
> 


[dpdk-dev] [PATCH] eal: allow virtual devices to be white/black listed

2014-11-17 Thread Nicolas Pernas Maradei

On 17/11/14 16:00, Neil Horman wrote:
> I get that, I was more asking, why those values?  They seem a bit magic to me,
> and might benefit from some descriptive macros or comments so they make more
> sense
> Neil
OK, I get you now.

Maybe the diff is not very clear. I just left the original calls to 
rte_eal_devargs_add() with the original values which they looked clear 
to me but I don't mind improving the comments a bit and as you 
suggested, adding in some macros.

I'll wait for some more feedback before submitting a second version of 
the patch.

Thanks,
Nico.


[dpdk-dev] [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits

2014-11-17 Thread Walukiewicz, Miroslaw


> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Friday, November 14, 2014 6:03 PM
> To: dev at dpdk.org
> Cc: olivier.matz at 6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, Yong;
> jigsaw at gmail.com; Richardson, Bruce
> Subject: [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64
> bits
> 
> Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
> packet flags are now 64 bits wide. Some occurences were forgotten in
> the ixgbe driver.

I think it should be present in separate patch. I do no not see any relation to 
TSO

> 
> Signed-off-by: Olivier Matz 
> Acked-by: Bruce Richardson 
> ---
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index ecebbf6..7e470ce 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -817,7 +817,7 @@ end_of_tx:
>  static inline uint64_t
>  rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
>  {
> - uint16_t pkt_flags;
> + uint64_t pkt_flags;
> 
>   static uint64_t ip_pkt_types_map[16] = {
>   0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT,
> PKT_RX_IPV4_HDR_EXT,
> @@ -834,7 +834,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t
> hl_tp_rs)
>   };
> 
>  #ifdef RTE_LIBRTE_IEEE1588
> - static uint32_t ip_pkt_etqf_map[8] = {
> + static uint64_t ip_pkt_etqf_map[8] = {
>   0, 0, 0, PKT_RX_IEEE1588_PTP,
>   0, 0, 0, 0,
>   };
> @@ -903,7 +903,7 @@ ixgbe_rx_scan_hw_ring(struct igb_rx_queue *rxq)
>   struct igb_rx_entry *rxep;
>   struct rte_mbuf *mb;
>   uint16_t pkt_len;
> - uint16_t pkt_flags;
> + uint64_t pkt_flags;
>   int s[LOOK_AHEAD], nb_dd;
>   int i, j, nb_rx = 0;
> 
> @@ -1335,7 +1335,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>   uint16_t nb_rx;
>   uint16_t nb_hold;
>   uint16_t data_len;
> - uint16_t pkt_flags;
> + uint64_t pkt_flags;
> 
>   nb_rx = 0;
>   nb_hold = 0;
> @@ -1511,9 +1511,9 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>   first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
>   hlen_type_rss =
> rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data);
>   pkt_flags =
> rx_desc_hlen_type_rss_to_pkt_flags(hlen_type_rss);
> - pkt_flags = (uint16_t)(pkt_flags |
> + pkt_flags = (pkt_flags |
>   rx_desc_status_to_pkt_flags(staterr));
> - pkt_flags = (uint16_t)(pkt_flags |
> + pkt_flags = (pkt_flags |
>   rx_desc_error_to_pkt_flags(staterr));
>   first_seg->ol_flags = pkt_flags;
> 
> --
> 2.1.0



[dpdk-dev] [PATCH] eal: allow virtual devices to be white/black listed

2014-11-17 Thread Bruce Richardson
On Mon, Nov 17, 2014 at 04:40:19PM +, Nicolas Pernas Maradei wrote:
> 
> On 17/11/14 16:00, Neil Horman wrote:
> >I get that, I was more asking, why those values?  They seem a bit magic to 
> >me,
> >and might benefit from some descriptive macros or comments so they make more
> >sense
> >Neil
> OK, I get you now.
> 
> Maybe the diff is not very clear. I just left the original calls to
> rte_eal_devargs_add() with the original values which they looked clear to me
> but I don't mind improving the comments a bit and as you suggested, adding
> in some macros.
> 
> I'll wait for some more feedback before submitting a second version of the
> patch.
>

Those are just example values for testing the parsing functionality. I'm don't
think descriptive macros are particularly needed, though maybe a short comment
or two might help out. The specific numeric values themselves aren't really 
important,
they are just examples of different patterns.

/Bruce



[dpdk-dev] [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits

2014-11-17 Thread Olivier MATZ
Hi Miroslaw,

On 11/17/2014 05:47 PM, Walukiewicz, Miroslaw wrote:
> 
> 
>> -Original Message-
>> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
>> Sent: Friday, November 14, 2014 6:03 PM
>> To: dev at dpdk.org
>> Cc: olivier.matz at 6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, 
>> Yong;
>> jigsaw at gmail.com; Richardson, Bruce
>> Subject: [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64
>> bits
>>
>> Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
>> packet flags are now 64 bits wide. Some occurences were forgotten in
>> the ixgbe driver.
> 
> I think it should be present in separate patch. I do no not see any relation 
> to TSO

You are right, there is no relation with TSO. The reason why I initially
added it in the same patchset is because I discovered this bug while
implementing TSO and I wanted to avoid too much noise on the list.

I can take out some patches from the series, but maybe it's too late
and it would confuse patchwork.

Thomas, what do you think?

Regards,
Olivier


[dpdk-dev] [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 64 bits

2014-11-17 Thread Thomas Monjalon
2014-11-17 18:03, Olivier MATZ:
> Hi Miroslaw,
> 
> On 11/17/2014 05:47 PM, Walukiewicz, Miroslaw wrote:
> > 
> > 
> >> -Original Message-
> >> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> >> Sent: Friday, November 14, 2014 6:03 PM
> >> To: dev at dpdk.org
> >> Cc: olivier.matz at 6wind.com; Walukiewicz, Miroslaw; Liu, Jijiang; Liu, 
> >> Yong;
> >> jigsaw at gmail.com; Richardson, Bruce
> >> Subject: [PATCH v2 02/13] ixgbe: fix remaining pkt_flags variable size to 
> >> 64
> >> bits
> >>
> >> Since commit 4332beee9 "mbuf: expand ol_flags field to 64-bits", the
> >> packet flags are now 64 bits wide. Some occurences were forgotten in
> >> the ixgbe driver.
> > 
> > I think it should be present in separate patch. I do no not see any 
> > relation to TSO
> 
> You are right, there is no relation with TSO. The reason why I initially
> added it in the same patchset is because I discovered this bug while
> implementing TSO and I wanted to avoid too much noise on the list.
> 
> I can take out some patches from the series, but maybe it's too late
> and it would confuse patchwork.
> 
> Thomas, what do you think?

In general, it's better to have only one feature in a patchset.
It's not a real problem here because TSO is planned for release 1.8 and fixes
are also welcome. So all the patches should enter in the coming days.
By the way, there is no problem with patchwork. You are free to choose :)

-- 
Thomas


[dpdk-dev] [PATCH v2 09/13] mbuf: introduce new checksum API

2014-11-17 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> Sent: Friday, November 14, 2014 5:03 PM
> To: dev at dpdk.org
> Cc: jigsaw at gmail.com
> Subject: [dpdk-dev] [PATCH v2 09/13] mbuf: introduce new checksum API
> 
> Introduce new functions to calculate checksums. These new functions
> are derivated from the ones provided csumonly.c but slightly reworked.
> There is still some room for future optimization of these functions
> (maybe SSE/AVX, ...).
> 
> This API will be modified in tbe next commits by the introduction of
> TSO that requires a different pseudo header checksum to be set in the
> packet.
> 
> Signed-off-by: Olivier Matz 

Acked-by: Konstantin Ananyev 

Just 2 nits from me:

1)
> +static inline uint16_t
> +rte_raw_cksum(const char *buf, size_t len)
> +{
...
> + while (len >= 8) {
> + sum += u16[0]; sum += u16[1]; sum += u16[2]; sum += u16[3];

Can you put each expression into a new line?
sum += u16[0];
sum += u16[1];
...

To make it easier to read.
Or can it be rewritten just like:
sum = (uint32_t)u16[0] + u16[1] + u16[2] + u16[3];
here?

2) 
> + while (len >= 8) {
> + sum += u16[0]; sum += u16[1]; sum += u16[2]; sum += u16[3];
> + len -= 8;
> + u16 += 4;
> + }
> + while (len >= 2) {
> + sum += *u16;
> + len -= 2;
> + u16 += 1;
> + }

In the code above, probably use sizeof(u16[0]) wherever appropriate.
To make things a bit more clearer and consistent.
...
while (len >=  4 * sizeof(u16[0]))
len -= 4 * sizeof(u16[0]);
u16 += 4; 
...
Same for second loop.


> ---
>  app/test-pmd/csumonly.c| 133 ++---
>  lib/librte_mbuf/rte_mbuf.h |   3 +-
>  lib/librte_net/rte_ip.h| 179 
> +
>  3 files changed, 189 insertions(+), 126 deletions(-)
> 
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index dda5d9e..39f974d 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -86,137 +86,22 @@
>  #define _htons(x) (x)
>  #endif
> 
> -static inline uint16_t
> -get_16b_sum(uint16_t *ptr16, uint32_t nr)
> -{
> - uint32_t sum = 0;
> - while (nr > 1)
> - {
> - sum +=*ptr16;
> - nr -= sizeof(uint16_t);
> - ptr16++;
> - if (sum > UINT16_MAX)
> - sum -= UINT16_MAX;
> - }
> -
> - /* If length is in odd bytes */
> - if (nr)
> - sum += *((uint8_t*)ptr16);
> -
> - sum = ((sum & 0x) >> 16) + (sum & 0x);
> - sum &= 0x0;
> - return (uint16_t)sum;
> -}
> -
> -static inline uint16_t
> -get_ipv4_cksum(struct ipv4_hdr *ipv4_hdr)
> -{
> - uint16_t cksum;
> - cksum = get_16b_sum((uint16_t*)ipv4_hdr, sizeof(struct ipv4_hdr));
> - return (uint16_t)((cksum == 0x)?cksum:~cksum);
> -}
> -
> -
> -static inline uint16_t
> -get_ipv4_psd_sum(struct ipv4_hdr *ip_hdr)
> -{
> - /* Pseudo Header for IPv4/UDP/TCP checksum */
> - union ipv4_psd_header {
> - struct {
> - uint32_t src_addr; /* IP address of source host. */
> - uint32_t dst_addr; /* IP address of destination 
> host(s). */
> - uint8_t  zero; /* zero. */
> - uint8_t  proto;/* L4 protocol type. */
> - uint16_t len;  /* L4 length. */
> - } __attribute__((__packed__));
> - uint16_t u16_arr[0];
> - } psd_hdr;
> -
> - psd_hdr.src_addr = ip_hdr->src_addr;
> - psd_hdr.dst_addr = ip_hdr->dst_addr;
> - psd_hdr.zero = 0;
> - psd_hdr.proto= ip_hdr->next_proto_id;
> - psd_hdr.len  = 
> rte_cpu_to_be_16((uint16_t)(rte_be_to_cpu_16(ip_hdr->total_length)
> - - sizeof(struct ipv4_hdr)));
> - return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
> -}
> -
> -static inline uint16_t
> -get_ipv6_psd_sum(struct ipv6_hdr *ip_hdr)
> -{
> - /* Pseudo Header for IPv6/UDP/TCP checksum */
> - union ipv6_psd_header {
> - struct {
> - uint8_t src_addr[16]; /* IP address of source host. */
> - uint8_t dst_addr[16]; /* IP address of destination 
> host(s). */
> - uint32_t len; /* L4 length. */
> - uint32_t proto;   /* L4 protocol - top 3 bytes must 
> be zero */
> - } __attribute__((__packed__));
> -
> - uint16_t u16_arr[0]; /* allow use as 16-bit values with safe 
> aliasing */
> - } psd_hdr;
> -
> - rte_memcpy(&psd_hdr.src_addr, ip_hdr->src_addr,
> - sizeof(ip_hdr->src_addr) + sizeof(ip_hdr->dst_addr));
> - psd_hdr.len   = ip_hdr->payload_len;
> - psd_hdr.proto = (ip_hdr->proto << 24);
> -
> - return get_16b_sum(psd_hdr.u16_arr, sizeof(psd_hdr));
> -}
> -
>  static uint16_t
>  get_psd_sum(void *l3_hdr, 

[dpdk-dev] [PATCH v2 11/13] ixgbe: support TCP segmentation offload

2014-11-17 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> Sent: Friday, November 14, 2014 5:03 PM
> To: dev at dpdk.org
> Cc: jigsaw at gmail.com
> Subject: [dpdk-dev] [PATCH v2 11/13] ixgbe: support TCP segmentation offload
> 
> Implement TSO (TCP segmentation offload) in ixgbe driver. The driver is
> now able to use PKT_TX_TCP_SEG mbuf flag and mbuf hardware offload infos
> (l2_len, l3_len, l4_len, tso_segsz) to configure the hardware support of
> TCP segmentation.
> 
> In ixgbe, when doing TSO, the IP length must not be included in the TCP
> pseudo header checksum. A new function ixgbe_fix_tcp_phdr_cksum() is
> used to fix the pseudo header checksum of the packet before giving it to
> the hardware.
> 
> In the patch, the tx_desc_cksum_flags_to_olinfo() and
> tx_desc_ol_flags_to_cmdtype() functions have been reworked to make them
> clearer. This should not impact performance as gcc (version 4.8 in my
> case) is smart enough to convert the tests into a code that does not
> contain any branch instruction.
> 
> Signed-off-by: Olivier Matz 

Acked-by: Konstantin Ananyev 

Just one thing - double semicolon -  looks like a typo:
> + /* check if TCP segmentation required for this packet */
> + if (ol_flags & PKT_TX_TCP_SEG) {
> + /* implies IP cksum and TCP cksum */
> + type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4 |
> + IXGBE_ADVTXD_TUCMD_L4T_TCP |
> + IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;;


> ---
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   3 +-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 169 
> ++--
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |  19 ++--
>  3 files changed, 117 insertions(+), 74 deletions(-)
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> index 2eb609c..2c2ecc0 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> @@ -1964,7 +1964,8 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
> rte_eth_dev_info *dev_info)
>   DEV_TX_OFFLOAD_IPV4_CKSUM  |
>   DEV_TX_OFFLOAD_UDP_CKSUM   |
>   DEV_TX_OFFLOAD_TCP_CKSUM   |
> - DEV_TX_OFFLOAD_SCTP_CKSUM;
> + DEV_TX_OFFLOAD_SCTP_CKSUM  |
> + DEV_TX_OFFLOAD_TCP_TSO;
> 
>   dev_info->default_rxconf = (struct rte_eth_rxconf) {
>   .rx_thresh = {
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index 2df3385..19e3b73 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -94,7 +94,8 @@
>  #define IXGBE_TX_OFFLOAD_MASK (   \
>   PKT_TX_VLAN_PKT |\
>   PKT_TX_IP_CKSUM |\
> - PKT_TX_L4_MASK)
> + PKT_TX_L4_MASK | \
> + PKT_TX_TCP_SEG)
> 
>  static inline struct rte_mbuf *
>  rte_rxmbuf_alloc(struct rte_mempool *mp)
> @@ -363,59 +364,84 @@ ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf 
> **tx_pkts,
>  static inline void
>  ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
>   volatile struct ixgbe_adv_tx_context_desc *ctx_txd,
> - uint64_t ol_flags, uint32_t vlan_macip_lens)
> + uint64_t ol_flags, union ixgbe_tx_offload tx_offload)
>  {
>   uint32_t type_tucmd_mlhl;
> - uint32_t mss_l4len_idx;
> + uint32_t mss_l4len_idx = 0;
>   uint32_t ctx_idx;
> - uint32_t cmp_mask;
> + uint32_t vlan_macip_lens;
> + union ixgbe_tx_offload tx_offload_mask;
> 
>   ctx_idx = txq->ctx_curr;
> - cmp_mask = 0;
> + tx_offload_mask.data = 0;
>   type_tucmd_mlhl = 0;
> 
> + /* Specify which HW CTX to upload. */
> + mss_l4len_idx |= (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT);
> +
>   if (ol_flags & PKT_TX_VLAN_PKT) {
> - cmp_mask |= TX_VLAN_CMP_MASK;
> + tx_offload_mask.vlan_tci = ~0;
>   }
> 
> - if (ol_flags & PKT_TX_IP_CKSUM) {
> - type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4;
> - cmp_mask |= TX_MACIP_LEN_CMP_MASK;
> - }
> + /* check if TCP segmentation required for this packet */
> + if (ol_flags & PKT_TX_TCP_SEG) {
> + /* implies IP cksum and TCP cksum */
> + type_tucmd_mlhl = IXGBE_ADVTXD_TUCMD_IPV4 |
> + IXGBE_ADVTXD_TUCMD_L4T_TCP |
> + IXGBE_ADVTXD_DTYP_CTXT | IXGBE_ADVTXD_DCMD_DEXT;;
> +
> + tx_offload_mask.l2_len = ~0;
> + tx_offload_mask.l3_len = ~0;
> + tx_offload_mask.l4_len = ~0;
> + tx_offload_mask.tso_segsz = ~0;
> + mss_l4len_idx |= tx_offload.tso_segsz << IXGBE_ADVTXD_MSS_SHIFT;
> + mss_l4len_idx |= tx_offload.l4_len << IXGBE_ADVTXD_L4LEN_SHIFT;
> + } else { /* no TSO, check if hardware checksum is needed */
> + if (ol_flags & PKT_TX

[dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of an ol_flag

2014-11-17 Thread Ananyev, Konstantin
Hi Oliver,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> Sent: Friday, November 14, 2014 5:03 PM
> To: dev at dpdk.org
> Cc: jigsaw at gmail.com
> Subject: [dpdk-dev] [PATCH v2 06/13] mbuf: add functions to get the name of 
> an ol_flag
> 
> In test-pmd (rxonly.c), the code is able to dump the list of ol_flags.
> The issue is that the list of flags in the application has to be
> synchronized with the flags defined in rte_mbuf.h.
> 
> This patch introduces 2 new functions rte_get_rx_ol_flag_name()
> and rte_get_tx_ol_flag_name() that returns the name of a flag from
> its mask. It also fixes rxonly.c to use this new functions and to
> display the proper flags.
> 
> Signed-off-by: Olivier Matz 
> ---
>  app/test-pmd/rxonly.c  | 36 ++--
>  lib/librte_mbuf/rte_mbuf.c | 45 +
>  lib/librte_mbuf/rte_mbuf.h | 22 ++
>  3 files changed, 77 insertions(+), 26 deletions(-)
> 
> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
> index 9ad1df6..51a530a 100644
> --- a/app/test-pmd/rxonly.c
> +++ b/app/test-pmd/rxonly.c
> @@ -71,26 +71,6 @@
> 
>  #include "testpmd.h"
> 
> -#define MAX_PKT_RX_FLAGS 13
> -static const char *pkt_rx_flag_names[MAX_PKT_RX_FLAGS] = {
> - "VLAN_PKT",
> - "RSS_HASH",
> - "PKT_RX_FDIR",
> - "IP_CKSUM",
> - "IP_CKSUM_BAD",
> -
> - "IPV4_HDR",
> - "IPV4_HDR_EXT",
> - "IPV6_HDR",
> - "IPV6_HDR_EXT",
> -
> - "IEEE1588_PTP",
> - "IEEE1588_TMST",
> -
> - "TUNNEL_IPV4_HDR",
> - "TUNNEL_IPV6_HDR",
> -};
> -
>  static inline void
>  print_ether_addr(const char *what, struct ether_addr *eth_addr)
>  {
> @@ -214,12 +194,16 @@ pkt_burst_receive(struct fwd_stream *fs)
>   printf(" - Receive queue=0x%x", (unsigned) fs->rx_queue);
>   printf("\n");
>   if (ol_flags != 0) {
> - int rxf;
> -
> - for (rxf = 0; rxf < MAX_PKT_RX_FLAGS; rxf++) {
> - if (ol_flags & (1 << rxf))
> - printf("  PKT_RX_%s\n",
> -pkt_rx_flag_names[rxf]);
> + unsigned rxf;
> + const char *name;
> +
> + for (rxf = 0; rxf < sizeof(mb->ol_flags) * 8; rxf++) {
> + if ((ol_flags & (1ULL << rxf)) == 0)
> + continue;
> + name = rte_get_rx_ol_flag_name(1ULL << rxf);
> + if (name == NULL)
> + continue;
> + printf("  %s\n", name);
>   }
>   }
>   rte_pktmbuf_free(mb);
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 52e7574..5cd9137 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -196,3 +196,48 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, 
> unsigned dump_len)
>   nb_segs --;
>   }
>  }
> +
> +/*
> + * Get the name of a RX offload flag
> + */
> +const char *rte_get_rx_ol_flag_name(uint64_t mask)
> +{
> + switch (mask) {
> + case PKT_RX_VLAN_PKT: return "PKT_RX_VLAN_PKT";
> + case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
> + case PKT_RX_FDIR: return "PKT_RX_FDIR";
> + case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
> + case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
> + /* case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD"; */
> + /* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
> + /* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
> + /* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
> + /* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */

Didn't spot it before, wonder why do you need these 5 commented out lines?
In fact, why do we need these flags if they all equal to zero right now?
I know these flags were not introduced by that patch, in fact as I can see it 
was a temporary measure,
as old ol_flags were just 16 bits long:
http://dpdk.org/ml/archives/dev/2014-June/003308.html
So wonder should now these flags either get proper values or be removed?

Konstantin

> + case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
> + case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
> + case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
> + case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
> + case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
> + case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
> + case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
> + case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
> + default: return NULL;
> + }
> +}
> +
> +/*
> + * Get the name of a TX offload flag
> + */
> +const char *rte_get_tx_ol_flag_

[dpdk-dev] [PATCH] lib: include rte_memory.h for __rte_cache_aligned

2014-11-17 Thread Jia Yu
Thanks for reviewing this change.

No error message was found without this patch. This patch follows a good
coding practice to include the header files when it?s needed, rather than
through some other header files. In the past, we found some data structure
was not cache line aligned due to missing the include, and caused memory
corruption. Compiler did not give warning or error when
__rte_cache_aligned attribute definition is missing. With DPDK latest, I
checked the data structures that used __rte_cache_aligned attributed in
these files, and they are properly aligned. So this patch is mainly for
better coding style now.

pahole -s libfile.[o, a]

ip_frag_pkt 192 1
ip_frag_tbl_stat64  0
malloc_elem 64  0
rte_mbuf128 1
rte_table_lpm   10881
rte_table_hash  64  1

Thanks,
Jia



On 11/10/14, 1:46 AM, "Thomas Monjalon"  wrote:

>2014-11-07 09:28, Jia Yu:
>> Include rte_memory.h for lib files that use __rte_cache_aligned
>> attribute.
>
>Please could you explain what was the error?
>As I suspect it's a fix, it would be clearer to start your title with
>"fix".
>
>Thanks
>-- 
>Thomas



[dpdk-dev] Performance impact with QoS

2014-11-17 Thread Dumitrescu, Cristian
Hi Satish,

The QoS traffic manager has a large memory footprint due to large number of 
packet queues (e.g. 64K queues of 64 packets each) and large tables (e.g. 4K 
pipes with one cache line of context per pipe) that far exceeds the amount of 
CPU cache physically available. There are a lot of data structures that need to 
be brought into the L1 cache of the traffic manager core in order to take the 
scheduling decision: bitmap, pipe table entry, queue read/write pointers, queue 
elements, packet metadata (mbuf), etc. To minimize the penalties associated 
with the CPU pipeline stalling due to memory accesses, all these data 
structures are prefetched.

So, the point I am trying to make is there are a lot of critical CPU resources 
involved: size of L1/L2 cache (per CPU core), size of L3 cache (shared by all 
CPU cores), bandwidth of L1/L2 cache (per core), bandwidth of L3 cache (shared 
by all CPU cores), number of outstanding prefetches (per CPU core), etc.

If you map the QoS traffic manager on the same core with packet I/O (i.e. Poll 
Mode Driver RX/TX), my guess is these two I/O intensive workloads will both 
compete for the CPU resources listed above and will also impact each other by 
thrashing each other data structures in and out of L1/L2 cache. If you split 
them on different CPU cores, their operation is more performant and more 
predictable, as each one is having its own L1/L2 cache now.

Did you try a CPU core chaining setup (through rte_rings) similar to qos_sched 
application, like: RX -> (TM enqueue & dequeue) -> TX or RX -> (TM enqueue & TM 
dequeue & TX)? I am sure you will find the right setup for you by conducting 
similar experiments. Of course, result also depends on which other workloads 
your application is performing.

Regards,
Cristian

From: satish [mailto:nsatishb...@gmail.com]
Sent: Monday, November 17, 2014 6:03 AM
To: dev at dpdk.org
Cc: Dumitrescu, Cristian
Subject: Re: Performance impact with QoS

Hi All,
Can someone please provide comments on queries in below mail?

Regards,
Satish Babu

On Mon, Nov 10, 2014 at 4:24 PM, satish mailto:nsatishbabu at gmail.com>> wrote:
Hi,
I need comments on performance impact with DPDK-QoS.

We are working on developing a application based on DPDK.
Our application supports IPv4 forwarding with and without QoS.

Without QOS, we are achieving almost full wire rate (bi-directional traffic) 
with 128, 256 and 512 byte packets.
But when we enabled QoS, performance dropped to half for 128 and 256 byte 
packets.
For 512 byte packet, we didn't observe any drop even after enabling QoS 
(Achieving full wire rate).
Traffic used in both the cases is same. ( One stream with Qos match to first 
queue in traffic class 0)

In our application, we are using memory buffer pools to receive the packet 
bursts (Ring buffer is not used).
Same buffer is used during packet processing and TX (enqueue and dequeue). All 
above handled on the same core.

For normal forwarding(without QoS), we are using rte_eth_tx_burst for TX.

For forwarding with QoS, using rte_sched_port_pkt_write(), 
rte_sched_port_enqueue () and rte_sched_port_dequeue ()
before rte_eth_tx_burst ().

We understood that performance dip in case of 128 and 256 byte packet is bacause
of processing more number of packets compared to 512 byte packet.

Can some comment on performance dip in my case with QOS enabled?
[1] can this be because of inefficient use of RTE calls for QoS?
[2] Is it the poor buffer management?
[3] any other comments?

To achieve good performance in QoS case, is it must to use worker thread 
(running on different core) with ring buffer?

Please provide your comments.

Thanks in advance.

Regards,
Satish Babu




--
Regards,
Satish Babu
--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.



[dpdk-dev] [PATCH] eal: allow virtual devices to be white/black listed

2014-11-17 Thread Neil Horman
On Mon, Nov 17, 2014 at 04:52:50PM +, Bruce Richardson wrote:
> On Mon, Nov 17, 2014 at 04:40:19PM +, Nicolas Pernas Maradei wrote:
> > 
> > On 17/11/14 16:00, Neil Horman wrote:
> > >I get that, I was more asking, why those values?  They seem a bit magic to 
> > >me,
> > >and might benefit from some descriptive macros or comments so they make 
> > >more
> > >sense
> > >Neil
> > OK, I get you now.
> > 
> > Maybe the diff is not very clear. I just left the original calls to
> > rte_eal_devargs_add() with the original values which they looked clear to me
> > but I don't mind improving the comments a bit and as you suggested, adding
> > in some macros.
> > 
> > I'll wait for some more feedback before submitting a second version of the
> > patch.
> >
> 
> Those are just example values for testing the parsing functionality. I'm don't
> think descriptive macros are particularly needed, though maybe a short comment
> or two might help out. The specific numeric values themselves aren't really 
> important,
> they are just examples of different patterns.
> 
> /Bruce
> 
> 

Agreed, if they're just arbitrary bus/dev/func values, Marcos aren't needed,
just a comment calling them out as such.

Thanks!
Neil



[dpdk-dev] [PATCH v2 03/13] mbuf: move vxlan_cksum flag definition at the proper place

2014-11-17 Thread Thomas Monjalon
2014-11-14 18:03, Olivier Matz:
> The tx mbuf flags are ordered from the highest value to the
> the lowest. Move the PKT_TX_VXLAN_CKSUM at the right place.

Please, could you reorder them from the lowest to the highest?
It will be simpler to understand. There is already a comment to explain
the reverse allocation of Tx flags.

Thanks
-- 
Thomas


[dpdk-dev] [PATCH v2 10/13] mbuf: generic support for TCP segmentation offload

2014-11-17 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> Sent: Friday, November 14, 2014 5:03 PM
> To: dev at dpdk.org
> Cc: jigsaw at gmail.com
> Subject: [dpdk-dev] [PATCH v2 10/13] mbuf: generic support for TCP 
> segmentation offload
> 
> Some of the NICs supported by DPDK have a possibility to accelerate TCP
> traffic by using segmentation offload. The application prepares a packet
> with valid TCP header with size up to 64K and deleguates the
> segmentation to the NIC.
> 
> Implement the generic part of TCP segmentation offload in rte_mbuf. It
> introduces 2 new fields in rte_mbuf: l4_len (length of L4 header in bytes)
> and tso_segsz (MSS of packets).
> 
> To delegate the TCP segmentation to the hardware, the user has to:
> 
> - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
>   PKT_TX_TCP_CKSUM)
> - set PKT_TX_IP_CKSUM if it's IPv4, and set the IP checksum to 0 in
>   the packet
> - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> - calculate the pseudo header checksum without taking ip_len in account,
>   and set it in the TCP header, for instance by using
>   rte_ipv4_phdr_cksum(ip_hdr, ol_flags)
> 
> The API is inspired from ixgbe hardware (the next commit adds the
> support for ixgbe), but it seems generic enough to be used for other
> hw/drivers in the future.
> 
> This commit also reworks the way l2_len and l3_len are used in igb
> and ixgbe drivers as the l2_l3_len is not available anymore in mbuf.
> 
> Signed-off-by: Mirek Walukiewicz 
> Signed-off-by: Olivier Matz 

Acked-by: Konstantin Ananyev 

> ---
>  app/test-pmd/testpmd.c|  2 +-
>  examples/ipv4_multicast/main.c|  2 +-
>  lib/librte_mbuf/rte_mbuf.c|  1 +
>  lib/librte_mbuf/rte_mbuf.h| 44 
> +++
>  lib/librte_net/rte_ip.h   | 39 +++---
>  lib/librte_pmd_e1000/igb_rxtx.c   | 11 +-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 11 +-
>  7 files changed, 81 insertions(+), 29 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 12adafa..632a993 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -408,7 +408,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
>   mb->ol_flags = 0;
>   mb->data_off = RTE_PKTMBUF_HEADROOM;
>   mb->nb_segs  = 1;
> - mb->l2_l3_len   = 0;
> + mb->tx_offload   = 0;
>   mb->vlan_tci = 0;
>   mb->hash.rss = 0;
>  }
> diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
> index 590d11a..80c5140 100644
> --- a/examples/ipv4_multicast/main.c
> +++ b/examples/ipv4_multicast/main.c
> @@ -302,7 +302,7 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
>   /* copy metadata from source packet*/
>   hdr->port = pkt->port;
>   hdr->vlan_tci = pkt->vlan_tci;
> - hdr->l2_l3_len = pkt->l2_l3_len;
> + hdr->tx_offload = pkt->tx_offload;
>   hdr->hash = pkt->hash;
> 
>   hdr->ol_flags = pkt->ol_flags;
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 5cd9137..75295c8 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -238,6 +238,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
>   case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
>   case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
>   case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> + case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
>   default: return NULL;
>   }
>  }
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 3c8e825..9f44d08 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -127,6 +127,20 @@ extern "C" {
> 
>  #define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed 
> by NIC */
> 
> +/**
> + * TCP segmentation offload. To enable this offload feature for a
> + * packet to be transmitted on hardware supporting TSO:
> + *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
> + *PKT_TX_TCP_CKSUM)
> + *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
> + *to 0 in the packet
> + *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
> + *  - calculate the pseudo header checksum without taking ip_len in accound,
> + *and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
> + *rte_ipv6_phdr_cksum() that can be used as helpers.
> + */
> +#define PKT_TX_TCP_SEG   (1ULL << 49)
> +
>  /* Use final bit of flags to indicate a control mbuf */
>  #define CTRL_MBUF_FLAG   (1ULL << 63) /**< Mbuf contains control data */
> 
> @@ -228,22 +242,18 @@ struct rte_mbuf {
> 
>   /* fields to support TX offloads */
>   union {
> - uint16_t l2_l3_len; /**< combined l2/l3 lengths as single var */
> + uint64_t tx_offload;   /**< combined f