Re: [dpdk-dev] [PATCH 1/4] vhost: move fdset functions from fd_man.c to fd_man.h

2018-03-05 Thread Thomas Monjalon
05/03/2018 08:43, Yang, Zhiyong:
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > 01/03/2018 07:02, Tan, Jianfeng:
> > > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> > > > On 02/28/2018 02:36 AM, Yang, Zhiyong wrote:
> > > > > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> > > > >> On 02/14/2018 03:53 PM, Zhiyong Yang wrote:
> > > > >>>lib/librte_vhost/Makefile |   3 +-
> > > > >>>lib/librte_vhost/fd_man.c | 274 
> > > > >>> ---
> > ---
> > > > >>>lib/librte_vhost/fd_man.h | 258
> > > > >> +--
> > > > >>>3 files changed, 253 insertions(+), 282 deletions(-)
> > > > >>>delete mode 100644 lib/librte_vhost/fd_man.c
> > > > >>
> > > > >> I disagree with the patch.
> > > > >> It is a good thing to reuse the code, but to do it, you need to
> > > > >> extend the vhost lib API.
> > > > >>
> > > > >> New API need to be prefixed with rte_vhost_, and be declared in
> > > > >> rte_vhost.h.
> > > > >>
> > > > >> And no need to move the functions from the .c to the .h file, as
> > > > >> it
> > > > moreover
> > > > >> makes you inline them, which is not necessary here.
> > > > >
> > > > > Thanks for your reviewing the series firstly, Maxime. :)
> > > > >
> > > > > I considered to do it as you said. However I still preferred this one 
> > > > > at last.
> > > > > Here are my reasons.
> > > > > 1) As far as I know, this set of functions are used privately in
> > > > > librte_vhost
> > > > before this feature.
> > > > > No strong request from the perspective of DPDK application. If I
> > > > understand well,  It is enough to expose the functions to all PMDs
> > > > > And it is better to keep internal use in DPDK.
> > > >
> > > > But what the patch is doing is adding fd_man.h to the API, without
> > > > doing it properly. fd_man.h will be installed with other header
> > > > files, and any external application can use it.
> > > >
> > > > >
> > > > > 2) These functions help to implement vhost user, but they are not
> > > > > strongly
> > > > related to other APIs of vhost user which have already exposed.
> > > > > if we want to expose them as APIs at lib layer, many functions and
> > > > > related
> > > > data structure has to be exposed in rte_vhost.h. it looks messy.
> > > > > Your opinion?
> > > >
> > > > Yes, it is not really vhost-related, it could be part of a more
> > > > generic library. It is maybe better to duplicate these lines, or to
> > > > move this code in a existing or new library.
> > >
> > > I vote to move it to generic library, maybe eal. Poll() has better
> > compatibility even though poll() is not as performant as epoll().
> > >
> > > Thomas, how do you think?
> > 
> > I don't see why it should be exported outside of DPDK, except for PMDs.
> > I would tend to keep it internal but I understand that it would mean
> > duplicating some code, which is not ideal.
> > Please could you show what would be the content of the .h in EAL?
> > 
> 
> If needed to expose them in eal.h, 
> I think that they should be the whole fdset mechanism as followings.
> 
> typedef void (*fd_cb)(int fd, void *dat, int *remove);
> 
> struct fdentry {
>   int fd; /* -1 indicates this entry is empty */
>   fd_cb rcb;  /* callback when this fd is readable. */
>   fd_cb wcb;  /* callback when this fd is writeable.*/
>   void *dat;  /* fd context */
>   int busy;   /* whether this entry is being used in cb. */
> };
> 
> struct fdset {
>   struct pollfd rwfds[MAX_FDS];
>   struct fdentry fd[MAX_FDS];
>   pthread_mutex_t fd_mutex;
>   int num;/* current fd number of this fdset */
> };
> 
> void fdset_init(struct fdset *pfdset);(not used in the patchset)
> 
> int fdset_add(struct fdset *pfdset, int fd,
>   fd_cb rcb, fd_cb wcb, void *dat); (used in this patchset)
> 
> void *fdset_del(struct fdset *pfdset, int fd); (not used in the patchset)
> 
> void *fdset_event_dispatch(void *arg);   (used in this patchset)
> 
> seems that we have 4 options.
> 1) expose them in librte_vhost
> 2) expose them in other existing or new libs. for example,  eal.
> 3) duplicate the code lines at PMD layer.
> 4) do it as the patch does that.

It looks to be very close of the interrupt thread.
Can we have all merged in an unique event dispatcher thread?





Re: [dpdk-dev] [PATCH 31/41] ethdev: use contiguous allocation for DMA memory

2018-03-05 Thread Burakov, Anatoly

On 03-Mar-18 2:05 PM, Andrew Rybchenko wrote:

On 03/03/2018 04:46 PM, Anatoly Burakov wrote:

This fixes the following drivers in one go:


Does it mean that these drivers are broken in the middle of patch set 
and fixed now?

If so, it would be good to avoid it. It breaks bisect.



Depends on the definition of "broken". Legacy memory mode will still 
work for all drivers throughout the patchset. As for new memory mode, 
yes, it will be "broken in the middle of the patchset", but due to the 
fact that there's enormous amount of code to review between fbarray 
changes, malloc changes, contiguous allocation changes and adding new 
rte_memzone API's, i favored ease of code review over bisect.


I can of course reorder and roll up several different patchset and all 
driver updates into one giant patch, but do you really want to be the 
one reviewing such a patch?


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH 31/41] ethdev: use contiguous allocation for DMA memory

2018-03-05 Thread Andrew Rybchenko

On 03/05/2018 12:08 PM, Burakov, Anatoly wrote:

On 03-Mar-18 2:05 PM, Andrew Rybchenko wrote:

On 03/03/2018 04:46 PM, Anatoly Burakov wrote:

This fixes the following drivers in one go:


Does it mean that these drivers are broken in the middle of patch set 
and fixed now?

If so, it would be good to avoid it. It breaks bisect.



Depends on the definition of "broken". Legacy memory mode will still 
work for all drivers throughout the patchset. As for new memory mode, 
yes, it will be "broken in the middle of the patchset", but due to the 
fact that there's enormous amount of code to review between fbarray 
changes, malloc changes, contiguous allocation changes and adding new 
rte_memzone API's, i favored ease of code review over bisect.


I can of course reorder and roll up several different patchset and all 
driver updates into one giant patch, but do you really want to be the 
one reviewing such a patch?


Is it possible to:
1. Introduce _contig function
2. Switch users of the contiguous allocation to it as you do now
3. Make the old function to allocate possibly non-contiguous memory



[dpdk-dev] [PATCH v2 1/6] vhost: export vhost feature definitions

2018-03-05 Thread Zhihong Wang
This patch exports vhost-user protocol features to support device driver
development.

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/rte_vhost.h  |  8 
 lib/librte_vhost/vhost.h  |  4 +---
 lib/librte_vhost/vhost_user.c |  9 +
 lib/librte_vhost/vhost_user.h | 20 +++-
 4 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index d33206997..b05162366 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -29,6 +29,14 @@ extern "C" {
 #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY   (1ULL << 2)
 #define RTE_VHOST_USER_IOMMU_SUPPORT   (1ULL << 3)
 
+#define RTE_VHOST_USER_PROTOCOL_F_MQ   0
+#define RTE_VHOST_USER_PROTOCOL_F_LOG_SHMFD1
+#define RTE_VHOST_USER_PROTOCOL_F_RARP 2
+#define RTE_VHOST_USER_PROTOCOL_F_REPLY_ACK3
+#define RTE_VHOST_USER_PROTOCOL_F_NET_MTU  4
+#define RTE_VHOST_USER_PROTOCOL_F_SLAVE_REQ5
+#define RTE_VHOST_USER_F_PROTOCOL_FEATURES 30
+
 /**
  * Information relating to memory regions including offsets to
  * addresses in QEMUs memory file.
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 58aec2e0d..a0b0520e2 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -174,8 +174,6 @@ struct vhost_msg {
  #define VIRTIO_F_VERSION_1 32
 #endif
 
-#define VHOST_USER_F_PROTOCOL_FEATURES 30
-
 /* Features supported by this builtin vhost-user net driver. */
 #define VIRTIO_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
(1ULL << VIRTIO_F_ANY_LAYOUT) | \
@@ -185,7 +183,7 @@ struct vhost_msg {
(1ULL << VIRTIO_NET_F_MQ)  | \
(1ULL << VIRTIO_F_VERSION_1)   | \
(1ULL << VHOST_F_LOG_ALL)  | \
-   (1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
+   (1ULL << RTE_VHOST_USER_F_PROTOCOL_FEATURES) | \
(1ULL << VIRTIO_NET_F_GSO) | \
(1ULL << VIRTIO_NET_F_HOST_TSO4) | \
(1ULL << VIRTIO_NET_F_HOST_TSO6) | \
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 5c5361066..c93e48e4d 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -527,7 +527,7 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, 
VhostUserMsg *msg)
vring_invalidate(dev, vq);
 
if (vq->enabled && (dev->features &
-   (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
+   (1ULL << RTE_VHOST_USER_F_PROTOCOL_FEATURES))) {
dev = translate_ring_addresses(dev, msg->payload.addr.index);
if (!dev)
return -1;
@@ -897,11 +897,11 @@ vhost_user_set_vring_kick(struct virtio_net **pdev, 
struct VhostUserMsg *pmsg)
vq = dev->virtqueue[file.index];
 
/*
-* When VHOST_USER_F_PROTOCOL_FEATURES is not negotiated,
+* When RTE_VHOST_USER_F_PROTOCOL_FEATURES is not negotiated,
 * the ring starts already enabled. Otherwise, it is enabled via
 * the SET_VRING_ENABLE message.
 */
-   if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES)))
+   if (!(dev->features & (1ULL << RTE_VHOST_USER_F_PROTOCOL_FEATURES)))
vq->enabled = 1;
 
if (vq->kickfd >= 0)
@@ -1012,7 +1012,8 @@ vhost_user_get_protocol_features(struct virtio_net *dev,
 * Qemu versions (from v2.7.0 to v2.9.0).
 */
if (!(features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
-   protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK);
+   protocol_features &=
+   ~(1ULL << RTE_VHOST_USER_PROTOCOL_F_REPLY_ACK);
 
msg->payload.u64 = protocol_features;
msg->size = sizeof(msg->payload.u64);
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 0fafbe6e0..066e772dd 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -14,19 +14,13 @@
 
 #define VHOST_MEMORY_MAX_NREGIONS 8
 
-#define VHOST_USER_PROTOCOL_F_MQ   0
-#define VHOST_USER_PROTOCOL_F_LOG_SHMFD1
-#define VHOST_USER_PROTOCOL_F_RARP 2
-#define VHOST_USER_PROTOCOL_F_REPLY_ACK3
-#define VHOST_USER_PROTOCOL_F_NET_MTU 4
-#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5
-
-#define VHOST_USER_PROTOCOL_FEATURES   ((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
-(1ULL << 
VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
-(1ULL << VHOST_USER_PROTOCOL_F_RARP) | 
\
-(1ULL << 
VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
-(1ULL << 
VHOST_USER_PROTOCOL_F_NET_MTU) | \
-(1ULL << 
VHOST_

[dpdk-dev] [PATCH v2 0/6] vhost: support selective datapath

2018-03-05 Thread Zhihong Wang
This patch set introduces support for selective datapath in DPDK vhost-user
lib. vDPA stands for vhost Data Path Acceleration. The idea is to enable
various types of virtio-compatible devices to do data transfer with virtio
driver directly to enable acceleration.

The default datapath is the existing software implementation, more options
will be available when new engines are added.

Design details


An engine is a group of virtio-compatible devices. The definition of engine
is as follows:

struct rte_vdpa_eng_addr {
union {
uint8_t __dummy[64];
struct rte_pci_addr pci_addr;
};
};

struct rte_vdpa_eng_info {
char name[MAX_VDPA_NAME_LEN];
struct rte_vdpa_eng_addr *addr;
};

struct rte_vdpa_dev_ops {
vdpa_dev_conf_tdev_conf;
vdpa_dev_close_t   dev_close;
vdpa_vring_state_set_t vring_state_set;
vdpa_feature_set_t feature_set;
vdpa_migration_done_t  migration_done;
vdpa_get_vfio_group_fd_t  get_vfio_group_fd;
vdpa_get_vfio_device_fd_t get_vfio_device_fd;
vdpa_get_notify_area_tget_notify_area;
};

struct rte_vdpa_eng_ops {
vdpa_eng_init_t   eng_init;
vdpa_eng_uninit_t eng_uninit;
vdpa_info_query_t info_query;
};

struct rte_vdpa_eng_driver {
const char *name;
struct rte_vdpa_eng_ops eng_ops;
struct rte_vdpa_dev_ops dev_ops;
} __rte_cache_aligned;

struct rte_vdpa_engine {
struct rte_vdpa_eng_infoeng_info;
struct rte_vdpa_eng_driver *eng_drv;
} __rte_cache_aligned;

A set of engine ops is defined in rte_vdpa_eng_ops for engine init, uninit,
and attributes reporting. The attributes are defined as follows:

struct rte_vdpa_eng_attr {
uint64_t features;
uint64_t protocol_features;
uint32_t queue_num;
uint32_t dev_num;
};

A set of device ops is defined in rte_vdpa_dev_ops for each virtio device
in the engine to do device specific operations.

Changes to the current vhost-user lib are:


 1. Make vhost device capabilities configurable to adopt various engines.
Such capabilities include supported features, protocol features, queue
number. APIs are introduced to let app configure these capabilities.

 2. In addition to the existing vhost framework, a set of callbacks is
added for vhost to call the driver for device operations at the right
time:

 a. dev_conf: Called to configure the actual device when the virtio
device becomes ready.

 b. dev_close: Called to close the actual device when the virtio device
is stopped.

 c. vring_state_set: Called to change the state of the vring in the
actual device when vring state changes.

 d. feature_set: Called to set the negotiated features to device.

 e. migration_done: Called to allow the device to response to RARP
sending.

 f. get_vfio_group_fd: Called to get the VFIO group fd of the device.

 g. get_vfio_device_fd: Called to get the VFIO device fd of the device.

 h. get_notify_area: Called to get the notify area info of the queue.

 3. To make vhost aware of its own type, an engine id (eid) and a device
id (did) are added into the vhost data structure to identify the actual
device. APIs are introduced to let app configure them. When the default
software datapath is used, eid and did are set to -1. When alternative
datapath is used, eid and did are set by app to specify which device to
use. Each vhost-user socket can have only 1 connection in this case.

Working process:


 1. Register driver during DPDK initialization.

 2. Register engine with driver name and address.

 3. Get engine attributes.

 4. For vhost device creation:

  a. Register vhost-user socket.

  b. Set eid and did of the vhost-user socket.

  c. Register vhost-user callbacks.

  d. Start to wait for connection.

 4. When connection comes and virtio device data structure is negotiated,
the device will be configured with all needed info.

---
Changes in v2:

 1. Ensure negotiated capabilities are supported in vhost-user lib.

 2. Add APIs for live migration.

 3. Configure the data path at the right time.

 4. Add VFIO related vDPA device ops.

 5. Rebase on dpdk-next-virtio.

Zhihong Wang (6):
  vhost: export vhost feature definitions
  vhost: support selective datapath
  vhost: add apis for datapath configuration
  vhost: adapt vhost lib for selective datapath
  vhost: add apis for live migration
  vhost: export new apis

 lib/librte_vhost/Makefile  |   4 +-
 lib/librte_vhost/rte_vdpa.h| 126 ++
 lib/librte_vhost/rte_vhost.h   | 157 +
 lib/librte_vhost/rte_vhost_version.map |  19 
 lib/librte_vhost/socket.c  | 141 -
 lib/librte_vhost/vdpa.c| 124 ++
 lib/librte_vho

[dpdk-dev] [PATCH v2 3/6] vhost: add apis for datapath configuration

2018-03-05 Thread Zhihong Wang
This patch adds APIs for datapath configuration. The eid and did of the
vhost-user socket can be configured to identify the actual device.

When the default software datapath is used, eid and did are set to -1.
When alternative datapath is used, eid and did are set by app to specify
which device to use. Each vhost-user socket can have only 1 connection in
this case.

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/rte_vhost.h | 64 +++
 lib/librte_vhost/socket.c| 65 
 lib/librte_vhost/vhost.c | 50 ++
 lib/librte_vhost/vhost.h | 10 +++
 4 files changed, 189 insertions(+)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index b05162366..a76acea6b 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -178,6 +178,50 @@ int rte_vhost_driver_register(const char *path, uint64_t 
flags);
 int rte_vhost_driver_unregister(const char *path);
 
 /**
+ * Set the engine id, enforce single connection per socket
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param eid
+ *  Engine id
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_set_vdpa_eid(const char *path, int eid);
+
+/**
+ * Set the device id, enforce single connection per socket
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param did
+ *  Device id
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_set_vdpa_did(const char *path, int did);
+
+/**
+ * Get the engine id
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @return
+ *  Engine id, -1 on failure
+ */
+int rte_vhost_driver_get_vdpa_eid(const char *path);
+
+/**
+ * Get the device id
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @return
+ *  Device id, -1 on failure
+ */
+int rte_vhost_driver_get_vdpa_did(const char *path);
+
+/**
  * Set the feature bits the vhost-user driver supports.
  *
  * @param path
@@ -442,6 +486,26 @@ int rte_vhost_vring_call(int vid, uint16_t vring_idx);
  */
 uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
 
+/**
+ * Get vdpa engine id for vhost device.
+ *
+ * @param vid
+ *  vhost device ID
+ * @return
+ *  engine id
+ */
+int rte_vhost_get_vdpa_eid(int vid);
+
+/**
+ * Get vdpa device id for vhost device.
+ *
+ * @param vid
+ *  vhost device ID
+ * @return
+ *  device id
+ */
+int rte_vhost_get_vdpa_did(int vid);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 6ba60f5dc..5367ba771 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -52,6 +52,13 @@ struct vhost_user_socket {
uint64_t supported_features;
uint64_t features;
 
+   /* engine and device id to identify a certain port on a specific
+* backend, both are set to -1 for sw. when used, one socket can
+* have 1 connection only.
+*/
+   int eid;
+   int did;
+
struct vhost_device_ops const *notify_ops;
 };
 
@@ -535,6 +542,64 @@ find_vhost_user_socket(const char *path)
 }
 
 int
+rte_vhost_driver_set_vdpa_eid(const char *path, int eid)
+{
+   struct vhost_user_socket *vsocket;
+
+   pthread_mutex_lock(&vhost_user.mutex);
+   vsocket = find_vhost_user_socket(path);
+   if (vsocket)
+   vsocket->eid = eid;
+   pthread_mutex_unlock(&vhost_user.mutex);
+
+   return vsocket ? 0 : -1;
+}
+
+int
+rte_vhost_driver_set_vdpa_did(const char *path, int did)
+{
+   struct vhost_user_socket *vsocket;
+
+   pthread_mutex_lock(&vhost_user.mutex);
+   vsocket = find_vhost_user_socket(path);
+   if (vsocket)
+   vsocket->did = did;
+   pthread_mutex_unlock(&vhost_user.mutex);
+
+   return vsocket ? 0 : -1;
+}
+
+int
+rte_vhost_driver_get_vdpa_eid(const char *path)
+{
+   struct vhost_user_socket *vsocket;
+   int eid = -1;
+
+   pthread_mutex_lock(&vhost_user.mutex);
+   vsocket = find_vhost_user_socket(path);
+   if (vsocket)
+   eid = vsocket->eid;
+   pthread_mutex_unlock(&vhost_user.mutex);
+
+   return eid;
+}
+
+int
+rte_vhost_driver_get_vdpa_did(const char *path)
+{
+   struct vhost_user_socket *vsocket;
+   int did = -1;
+
+   pthread_mutex_lock(&vhost_user.mutex);
+   vsocket = find_vhost_user_socket(path);
+   if (vsocket)
+   did = vsocket->did;
+   pthread_mutex_unlock(&vhost_user.mutex);
+
+   return did;
+}
+
+int
 rte_vhost_driver_disable_features(const char *path, uint64_t features)
 {
struct vhost_user_socket *vsocket;
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index f6f12a03b..45cf90f99 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -283,6 +283,8 @@ vhost_new_device(void)
dev->vid = i;
dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET;
dev->slave_req_fd = -1;
+   dev->eid = 

[dpdk-dev] [PATCH v2 2/6] vhost: support selective datapath

2018-03-05 Thread Zhihong Wang
This patch introduces support for selective datapath in DPDK vhost-user lib
to enable various types of virtio-compatible devices to do data transfer
with virtio driver directly to enable acceleration. The default datapath is
the existing software implementation, more options will be available when
new engines are registered.

An engine is a group of virtio-compatible devices under a single address.
The engine driver includes:

 1. A set of engine ops is defined in rte_vdpa_eng_ops to perform engine
init, uninit, and attributes reporting.

 2. A set of device ops is defined in rte_vdpa_dev_ops for virtio devices
in the engine to do device specific operations:

 a. dev_conf: Called to configure the actual device when the virtio
device becomes ready.

 b. dev_close: Called to close the actual device when the virtio device
is stopped.

 c. vring_state_set: Called to change the state of the vring in the
actual device when vring state changes.

 d. feature_set: Called to set the negotiated features to device.

 e. migration_done: Called to allow the device to response to RARP
sending.

 f. get_vfio_group_fd: Called to get the VFIO group fd of the device.

 g. get_vfio_device_fd: Called to get the VFIO device fd of the device.

 h. get_notify_area: Called to get the notify area info of the queue.

Signed-off-by: Zhihong Wang 
---
Changes in v2:

 1. Add VFIO related vDPA device ops.

 lib/librte_vhost/Makefile   |   4 +-
 lib/librte_vhost/rte_vdpa.h | 120 ++
 lib/librte_vhost/vdpa.c | 124 
 3 files changed, 246 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_vhost/rte_vdpa.h
 create mode 100644 lib/librte_vhost/vdpa.c

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 5d6c6abae..37044ac03 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -22,9 +22,9 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev 
-lrte_net
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
-   vhost_user.c virtio_net.c
+   vhost_user.c virtio_net.c vdpa.c
 
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
new file mode 100644
index 0..1bde36f7f
--- /dev/null
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -0,0 +1,120 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_VDPA_H_
+#define _RTE_VDPA_H_
+
+/**
+ * @file
+ *
+ * Device specific vhost lib
+ */
+
+#include 
+#include "rte_vhost.h"
+
+#define MAX_VDPA_ENGINE_NUM 128
+#define MAX_VDPA_NAME_LEN 128
+
+struct rte_vdpa_eng_addr {
+   union {
+   uint8_t __dummy[64];
+   struct rte_pci_addr pci_addr;
+   };
+};
+
+struct rte_vdpa_eng_info {
+   struct rte_vdpa_eng_addr *addr;
+   char name[MAX_VDPA_NAME_LEN];
+};
+
+struct rte_vdpa_eng_attr {
+   uint64_t features;
+   uint64_t protocol_features;
+   uint32_t queue_num;
+   uint32_t dev_num;
+};
+
+/* register/remove engine */
+typedef int (*vdpa_eng_init_t)(int eid, struct rte_vdpa_eng_addr *addr);
+typedef int (*vdpa_eng_uninit_t)(int eid);
+
+/* query info of this engine */
+typedef int (*vdpa_info_query_t)(int eid,
+   struct rte_vdpa_eng_attr *attr);
+
+/* driver configure/close the port based on connection */
+typedef int (*vdpa_dev_conf_t)(int vid);
+typedef int (*vdpa_dev_close_t)(int vid);
+
+/* enable/disable this vring */
+typedef int (*vdpa_vring_state_set_t)(int vid, int vring, int state);
+
+/* set features when changed */
+typedef int (*vdpa_feature_set_t)(int vid);
+
+/* destination operations when migration done, e.g. send rarp */
+typedef int (*vdpa_migration_done_t)(int vid);
+
+/* get the vfio group fd */
+typedef int (*vdpa_get_vfio_group_fd_t)(int vid);
+
+/* get the vfio device fd */
+typedef int (*vdpa_get_vfio_device_fd_t)(int vid);
+
+/* get the notify area info of the queue */
+typedef int (*vdpa_get_notify_area_t)(int vid, int qid, uint64_t *offset,
+   uint64_t *size);
+/* device ops */
+struct rte_vdpa_dev_ops {
+   vdpa_dev_conf_t   dev_conf;
+   vdpa_dev_close_t  dev_close;
+   vdpa_vring_state_set_tvring_state_set;
+   vdpa_feature_set_tfeature_set;
+   vdpa_migration_done_t migration_done;
+   vdpa_get_vfio_group_fd_t  get_vfio_group_fd;
+   vdpa_get_vfio_device_fd_t get_vfio_device_fd;
+   vdpa_get_notify_area_tget_notify_area;
+};
+
+/* engine ops */
+struct rte_vdpa_eng_ops {
+   vdpa_eng_init_t eng_init;
+   vdpa_eng_uninit_t eng_uninit;
+ 

[dpdk-dev] [PATCH v2 4/6] vhost: adapt vhost lib for selective datapath

2018-03-05 Thread Zhihong Wang
This patch adapts vhost lib for selective datapath by calling device ops
at the corresponding stage.

Signed-off-by: Zhihong Wang 
---
Changes in v2:

 1. Ensure negotiated capabilities are supported in vhost-user lib.

 2. Configure the data path at the right time.

 lib/librte_vhost/rte_vhost.h  | 25 ++
 lib/librte_vhost/socket.c | 76 +--
 lib/librte_vhost/vhost.c  |  3 ++
 lib/librte_vhost/vhost.h  |  2 ++
 lib/librte_vhost/vhost_user.c | 56 +++
 5 files changed, 154 insertions(+), 8 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index a76acea6b..9bec36756 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -277,6 +277,31 @@ int rte_vhost_driver_disable_features(const char *path, 
uint64_t features);
 int rte_vhost_driver_get_features(const char *path, uint64_t *features);
 
 /**
+ * Get the protocol feature bits before feature negotiation.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param protocol_features
+ *  A pointer to store the queried protocol feature bits
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_get_protocol_features(const char *path,
+   uint64_t *protocol_features);
+
+/**
+ * Get the queue number bits before feature negotiation.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param queue_num
+ *  A pointer to store the queried queue number bits
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num);
+
+/**
  * Get the feature bits after negotiation
  *
  * @param vid
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 5367ba771..0354740fa 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -216,6 +216,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket 
*vsocket)
 
vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
 
+   vhost_set_vdpa_eid(vid, vsocket->eid);
+   vhost_set_vdpa_did(vid, vsocket->did);
+
if (vsocket->dequeue_zero_copy)
vhost_enable_dequeue_zero_copy(vid);
 
@@ -667,11 +670,80 @@ int
 rte_vhost_driver_get_features(const char *path, uint64_t *features)
 {
struct vhost_user_socket *vsocket;
+   struct rte_vdpa_eng_attr attr;
+   int eid = -1;
 
pthread_mutex_lock(&vhost_user.mutex);
vsocket = find_vhost_user_socket(path);
-   if (vsocket)
-   *features = vsocket->features;
+   if (vsocket) {
+   eid = vsocket->eid;
+   if (rte_vdpa_info_query(eid, &attr) < 0)
+   *features = vsocket->features;
+   else
+   *features = vsocket->features & attr.features;
+
+   }
+   pthread_mutex_unlock(&vhost_user.mutex);
+
+   if (!vsocket) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "socket file %s is not registered yet.\n", path);
+   return -1;
+   } else {
+   return 0;
+   }
+}
+
+int
+rte_vhost_driver_get_protocol_features(const char *path,
+   uint64_t *protocol_features)
+{
+   struct vhost_user_socket *vsocket;
+   struct rte_vdpa_eng_attr attr;
+   int eid = -1;
+
+   pthread_mutex_lock(&vhost_user.mutex);
+   vsocket = find_vhost_user_socket(path);
+   if (vsocket) {
+   eid = vsocket->eid;
+   if (rte_vdpa_info_query(eid, &attr) < 0)
+   *protocol_features = VHOST_USER_PROTOCOL_FEATURES;
+   else
+   *protocol_features = VHOST_USER_PROTOCOL_FEATURES
+   & attr.protocol_features;
+
+   }
+   pthread_mutex_unlock(&vhost_user.mutex);
+
+   if (!vsocket) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "socket file %s is not registered yet.\n", path);
+   return -1;
+   } else {
+   return 0;
+   }
+}
+
+int
+rte_vhost_driver_get_queue_num(const char *path,
+   uint32_t *queue_num)
+{
+   struct vhost_user_socket *vsocket;
+   struct rte_vdpa_eng_attr attr;
+   int eid = -1;
+
+   pthread_mutex_lock(&vhost_user.mutex);
+   vsocket = find_vhost_user_socket(path);
+   if (vsocket) {
+   eid = vsocket->eid;
+   if (rte_vdpa_info_query(eid, &attr) < 0)
+   *queue_num = VHOST_MAX_QUEUE_PAIRS;
+   else if (attr.queue_num > VHOST_MAX_QUEUE_PAIRS)
+   *queue_num = VHOST_MAX_QUEUE_PAIRS;
+   else
+   *queue_num = attr.queue_num;
+
+   }
pthread_mutex_unlock(&vhost_user.mutex);
 
if (!vsocket) {
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 45cf90f99..f8a5a1c42 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/l

[dpdk-dev] [PATCH v2 5/6] vhost: add apis for live migration

2018-03-05 Thread Zhihong Wang
This patch adds APIs to enable live migration for non-builtin data paths.

At src side, last_avail/used_idx from the device need to be set into the
virtio_net structure, and the log_base and log_size from the virtio_net
structure need to be set into the device.

At dst side, last_avail/used_idx need to be read from the virtio_net
structure and set into the device.

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/rte_vhost.h | 49 ++
 lib/librte_vhost/vhost.c | 63 
 2 files changed, 112 insertions(+)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 9bec36756..48005d9ff 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -512,6 +512,55 @@ int rte_vhost_vring_call(int vid, uint16_t vring_idx);
 uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
 
 /**
+ * Get log base and log size of the vhost device
+ *
+ * @param vid
+ *  vhost device ID
+ * @param log_base
+ *  vhost log base
+ * @param log_size
+ *  vhost log size
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_get_log_base(int vid, uint64_t *log_base,
+   uint64_t *log_size);
+
+/**
+ * Get last_avail/used_idx of the vhost virtqueue
+ *
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  vhost queue index
+ * @param last_avail_idx
+ *  vhost last_avail_idx to get
+ * @param last_used_idx
+ *  vhost last_used_idx to get
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_get_vring_base(int vid, uint16_t queue_id,
+   uint16_t *last_avail_idx, uint16_t *last_used_idx);
+
+/**
+ * Set last_avail/used_idx of the vhost virtqueue
+ *
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  vhost queue index
+ * @param last_avail_idx
+ *  last_avail_idx to set
+ * @param last_used_idx
+ *  last_used_idx to set
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_set_vring_base(int vid, uint16_t queue_id,
+   uint16_t last_avail_idx, uint16_t last_used_idx);
+
+/**
  * Get vdpa engine id for vhost device.
  *
  * @param vid
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index f8a5a1c42..c7332c557 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -667,3 +667,66 @@ int rte_vhost_get_vdpa_did(int vid)
 
return dev->did;
 }
+
+int rte_vhost_get_log_base(int vid, uint64_t *log_base,
+   uint64_t *log_size)
+{
+   struct virtio_net *dev = get_device(vid);
+
+   if (!dev)
+   return -1;
+
+   if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+   RTE_LOG(ERR, VHOST_DATA,
+   "(%d) %s: built-in vhost net backend is disabled.\n",
+   dev->vid, __func__);
+   return -1;
+   }
+
+   *log_base = dev->log_base;
+   *log_size = dev->log_size;
+
+   return 0;
+}
+
+int rte_vhost_get_vring_base(int vid, uint16_t queue_id,
+   uint16_t *last_avail_idx, uint16_t *last_used_idx)
+{
+   struct virtio_net *dev = get_device(vid);
+
+   if (!dev)
+   return -1;
+
+   if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+   RTE_LOG(ERR, VHOST_DATA,
+   "(%d) %s: built-in vhost net backend is disabled.\n",
+   dev->vid, __func__);
+   return -1;
+   }
+
+   *last_avail_idx = dev->virtqueue[queue_id]->last_avail_idx;
+   *last_used_idx = dev->virtqueue[queue_id]->last_used_idx;
+
+   return 0;
+}
+
+int rte_vhost_set_vring_base(int vid, uint16_t queue_id,
+   uint16_t last_avail_idx, uint16_t last_used_idx)
+{
+   struct virtio_net *dev = get_device(vid);
+
+   if (!dev)
+   return -1;
+
+   if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+   RTE_LOG(ERR, VHOST_DATA,
+   "(%d) %s: built-in vhost net backend is disabled.\n",
+   dev->vid, __func__);
+   return -1;
+   }
+
+   dev->virtqueue[queue_id]->last_avail_idx = last_avail_idx;
+   dev->virtqueue[queue_id]->last_used_idx = last_used_idx;
+
+   return 0;
+}
-- 
2.13.6



[dpdk-dev] [PATCH v2 6/6] vhost: export new apis

2018-03-05 Thread Zhihong Wang
This patch exports new APIs as experimental.

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/rte_vdpa.h| 16 +++-
 lib/librte_vhost/rte_vhost.h   | 33 ++---
 lib/librte_vhost/rte_vhost_version.map | 19 +++
 3 files changed, 52 insertions(+), 16 deletions(-)

diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index 1bde36f7f..23fb471be 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -100,15 +100,21 @@ extern struct rte_vdpa_engine *vdpa_engines[];
 extern uint32_t vdpa_engine_num;
 
 /* engine management */
-int rte_vdpa_register_engine(const char *name, struct rte_vdpa_eng_addr *addr);
-int rte_vdpa_unregister_engine(int eid);
+int __rte_experimental
+rte_vdpa_register_engine(const char *name, struct rte_vdpa_eng_addr *addr);
 
-int rte_vdpa_find_engine_id(struct rte_vdpa_eng_addr *addr);
+int __rte_experimental
+rte_vdpa_unregister_engine(int eid);
 
-int rte_vdpa_info_query(int eid, struct rte_vdpa_eng_attr *attr);
+int __rte_experimental
+rte_vdpa_find_engine_id(struct rte_vdpa_eng_addr *addr);
+
+int __rte_experimental
+rte_vdpa_info_query(int eid, struct rte_vdpa_eng_attr *attr);
 
 /* driver register api */
-void rte_vdpa_register_driver(struct rte_vdpa_eng_driver *drv);
+void __rte_experimental
+rte_vdpa_register_driver(struct rte_vdpa_eng_driver *drv);
 
 #define RTE_VDPA_REGISTER_DRIVER(nm, drv) \
 RTE_INIT(vdpainitfn_ ##nm); \
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 48005d9ff..d5589c543 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -187,7 +187,8 @@ int rte_vhost_driver_unregister(const char *path);
  * @return
  *  0 on success, -1 on failure
  */
-int rte_vhost_driver_set_vdpa_eid(const char *path, int eid);
+int __rte_experimental
+rte_vhost_driver_set_vdpa_eid(const char *path, int eid);
 
 /**
  * Set the device id, enforce single connection per socket
@@ -199,7 +200,8 @@ int rte_vhost_driver_set_vdpa_eid(const char *path, int 
eid);
  * @return
  *  0 on success, -1 on failure
  */
-int rte_vhost_driver_set_vdpa_did(const char *path, int did);
+int __rte_experimental
+rte_vhost_driver_set_vdpa_did(const char *path, int did);
 
 /**
  * Get the engine id
@@ -209,7 +211,8 @@ int rte_vhost_driver_set_vdpa_did(const char *path, int 
did);
  * @return
  *  Engine id, -1 on failure
  */
-int rte_vhost_driver_get_vdpa_eid(const char *path);
+int __rte_experimental
+rte_vhost_driver_get_vdpa_eid(const char *path);
 
 /**
  * Get the device id
@@ -219,7 +222,8 @@ int rte_vhost_driver_get_vdpa_eid(const char *path);
  * @return
  *  Device id, -1 on failure
  */
-int rte_vhost_driver_get_vdpa_did(const char *path);
+int __rte_experimental
+rte_vhost_driver_get_vdpa_did(const char *path);
 
 /**
  * Set the feature bits the vhost-user driver supports.
@@ -286,7 +290,8 @@ int rte_vhost_driver_get_features(const char *path, 
uint64_t *features);
  * @return
  *  0 on success, -1 on failure
  */
-int rte_vhost_driver_get_protocol_features(const char *path,
+int __rte_experimental
+rte_vhost_driver_get_protocol_features(const char *path,
uint64_t *protocol_features);
 
 /**
@@ -299,7 +304,8 @@ int rte_vhost_driver_get_protocol_features(const char *path,
  * @return
  *  0 on success, -1 on failure
  */
-int rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num);
+int __rte_experimental
+rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num);
 
 /**
  * Get the feature bits after negotiation
@@ -523,7 +529,8 @@ uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
  * @return
  *  0 on success, -1 on failure
  */
-int rte_vhost_get_log_base(int vid, uint64_t *log_base,
+int __rte_experimental
+rte_vhost_get_log_base(int vid, uint64_t *log_base,
uint64_t *log_size);
 
 /**
@@ -540,7 +547,8 @@ int rte_vhost_get_log_base(int vid, uint64_t *log_base,
  * @return
  *  0 on success, -1 on failure
  */
-int rte_vhost_get_vring_base(int vid, uint16_t queue_id,
+int __rte_experimental
+rte_vhost_get_vring_base(int vid, uint16_t queue_id,
uint16_t *last_avail_idx, uint16_t *last_used_idx);
 
 /**
@@ -557,7 +565,8 @@ int rte_vhost_get_vring_base(int vid, uint16_t queue_id,
  * @return
  *  0 on success, -1 on failure
  */
-int rte_vhost_set_vring_base(int vid, uint16_t queue_id,
+int __rte_experimental
+rte_vhost_set_vring_base(int vid, uint16_t queue_id,
uint16_t last_avail_idx, uint16_t last_used_idx);
 
 /**
@@ -568,7 +577,8 @@ int rte_vhost_set_vring_base(int vid, uint16_t queue_id,
  * @return
  *  engine id
  */
-int rte_vhost_get_vdpa_eid(int vid);
+int __rte_experimental
+rte_vhost_get_vdpa_eid(int vid);
 
 /**
  * Get vdpa device id for vhost device.
@@ -578,7 +588,8 @@ int rte_vhost_get_vdpa_eid(int vid);
  * @return
  *  device id
  */
-int rte_vhost_get_vdpa_did(int vid);
+int __rte_experimental
+rte_vhost_get_v

[dpdk-dev] [PATCH] usertools: add support for AVP device

2018-03-05 Thread Xiaohua Zhang
Signed-off-by: Xiaohua Zhang 
---
 usertools/dpdk-devbind.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 18d9386..ff4b186 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -22,8 +22,10 @@
   'SVendor': None, 'SDevice': None}
 cavium_pkx = {'Class': '08', 'Vendor': '177d', 'Device': 'a0dd,a049',
   'SVendor': None, 'SDevice': None}
+avp_vnic = {'Class': '05', 'Vendor': '1af4', 'Device': '1110',
+  'SVendor': None, 'SDevice': None}
 
-network_devices = [network_class, cavium_pkx]
+network_devices = [network_class, cavium_pkx, avp_vnic]
 crypto_devices = [encryption_class, intel_processor_class]
 eventdev_devices = [cavium_sso]
 mempool_devices = [cavium_fpa]
-- 
1.9.1



Re: [dpdk-dev] [PATCH] usertools: add support for AVP device

2018-03-05 Thread Ferruh Yigit
On 3/5/2018 9:16 AM, Xiaohua Zhang wrote:
> Signed-off-by: Xiaohua Zhang 

Acked-by: Ferruh Yigit 



Re: [dpdk-dev] [PATCH 2/4] bus/vdev: bus scan by multi-process channel

2018-03-05 Thread Burakov, Anatoly

On 04-Mar-18 3:30 PM, Jianfeng Tan wrote:

To scan the vdevs in primary, we send request to primary process
to obtain the names for vdevs.

Only the name is shared from the primary. In probe(), the device
driver is supposed to locate (or request more) the detail
information from the primary.

Signed-off-by: Jianfeng Tan 
---


Is there much point in having private vdevs? Granted, i'm not exactly a 
heavy user of vdev's, but to me this would seem like a way to introduce 
more confusion. How do i tell which devices are shared between 
processes, and which are private to one process? Can i control which one 
do i get? To me it would seem like it would be better to just switch all 
vdevs to being shared.


--
Thanks,
Anatoly


Re: [dpdk-dev] 16.11.5 (LTS) patches review and test

2018-03-05 Thread Luca Boccassi
On Mon, 2018-02-26 at 11:34 +, Luca Boccassi wrote:
> Hi all,
> 
> Here is a list of patches targeted for LTS release 16.11.5. Please
> help review and test. The planned date for the final release is March
> the 5th, pending results from regression tests.
> Before that, please shout if anyone has objections with these
> patches being applied.
> 
> These patches are located at branch 16.11 of dpdk-stable repo:
> http://dpdk.org/browse/dpdk-stable/
> 
> Thanks.
> 
> Luca Boccassi

Status update: we are currently waiting for some test results, so
16.11.5 might be postponed by a couple of days. Will update when I know
more. Apologies for the delay.

-- 
Kind regards,
Luca Boccassi


Re: [dpdk-dev] [PATCH 31/41] ethdev: use contiguous allocation for DMA memory

2018-03-05 Thread Burakov, Anatoly

On 05-Mar-18 9:15 AM, Andrew Rybchenko wrote:

On 03/05/2018 12:08 PM, Burakov, Anatoly wrote:

On 03-Mar-18 2:05 PM, Andrew Rybchenko wrote:

On 03/03/2018 04:46 PM, Anatoly Burakov wrote:

This fixes the following drivers in one go:


Does it mean that these drivers are broken in the middle of patch set 
and fixed now?

If so, it would be good to avoid it. It breaks bisect.



Depends on the definition of "broken". Legacy memory mode will still 
work for all drivers throughout the patchset. As for new memory mode, 
yes, it will be "broken in the middle of the patchset", but due to the 
fact that there's enormous amount of code to review between fbarray 
changes, malloc changes, contiguous allocation changes and adding new 
rte_memzone API's, i favored ease of code review over bisect.


I can of course reorder and roll up several different patchset and all 
driver updates into one giant patch, but do you really want to be the 
one reviewing such a patch?


Is it possible to:
1. Introduce _contig function
2. Switch users of the contiguous allocation to it as you do now
3. Make the old function to allocate possibly non-contiguous memory



Good point. I'll see if i can shuffle patches around for v2. Thanks!

--
Thanks,
Anatoly


Re: [dpdk-dev] 16.11.5 (LTS) patches review and test

2018-03-05 Thread Luca Boccassi
On Mon, 2018-03-05 at 11:31 +0530, gowrishankar muthukrishnan wrote:
> Hi Luca,
> In powerpc to support i40e, we wish below patch be merged:
> 
> c3def6a8724 net/i40e: implement vector PMD for altivec
> 
> I have verified br-16.11 with the above commit (in cherry-pick, I
> needed 
> to remove release
> notes which was meant for 17.05 release which hope is fine here).
> Could you please merge the above.
> 
> Thanks,
> Gowrishankar

Hi,

This introduced a new PMD for that architecture, right?

If so I can merge the patch, at the following conditions:

1) It will be disabled by default
2) Support and help in backporting will have to be provided by the
authors for the remaining lifetime of 16.11

Is this OK for you?

> On Monday 26 February 2018 05:04 PM, Luca Boccassi wrote:
> > Hi all,
> > 
> > Here is a list of patches targeted for LTS release 16.11.5. Please
> > help review and test. The planned date for the final release is
> > March
> > the 5th, pending results from regression tests.
> > Before that, please shout if anyone has objections with these
> > patches being applied.
> > 
> > These patches are located at branch 16.11 of dpdk-stable repo:
> >  http://dpdk.org/browse/dpdk-stable/
> > 
> > Thanks.
> > 
> > Luca Boccassi
> > 
> > ---
> > Ajit Khaparde (6):
> >    net/bnxt: support new PCI IDs
> >    net/bnxt: parse checksum offload flags
> >    net/bnxt: fix group info usage
> >    net/bnxt: fix broadcast cofiguration
> >    net/bnxt: fix size of Tx ring in HW
> >    net/bnxt: fix link speed setting with autoneg off
> > 
> > Akhil Goyal (1):
> >    examples/ipsec-secgw: fix corner case for SPI value
> > 
> > Alejandro Lucero (3):
> >    net/nfp: fix MTU settings
> >    net/nfp: fix jumbo settings
> >    net/nfp: fix CRC strip check behaviour
> > 
> > Anatoly Burakov (14):
> >    memzone: fix leak on allocation error
> >    malloc: protect stats with lock
> >    malloc: fix end for bounded elements
> >    vfio: fix enabled check on error
> >    app/procinfo: add compilation option in config
> >    test: register test as failed if setup failed
> >    test/table: fix uninitialized parameter
> >    test/memzone: fix wrong test
> >    test/memzone: handle previously allocated memzones
> >    usertools/devbind: remove unused function
> >    test/reorder: fix memory leak
> >    test/ring_perf: fix memory leak
> >    test/table: fix memory leak
> >    test/timer_perf: fix memory leak
> > 
> > Andriy Berestovskyy (1):
> >    keepalive: fix state alignment
> > 
> > Bao-Long Tran (1):
> >    examples/ip_pipeline: fix timer period unit
> > 
> > Beilei Xing (8):
> >    net/i40e: fix flow director Rx resource defect
> >    net/i40e: add warnings when writing global registers
> >    net/i40e: add debug logs when writing global registers
> >    net/i40e: fix multiple driver support issue
> >    net/i40e: fix interrupt conflict when using multi-driver
> >    net/i40e: fix Rx interrupt
> >    net/i40e: check multi-driver option parsing
> >    app/testpmd: fix flow director filter
> > 
> > Chas Williams (1):
> >    net/bonding: fix setting slave MAC addresses
> > 
> > David Harton (1):
> >    net/i40e: fix VF reset stats crash
> > 
> > Didier Pallard (1):
> >    net/virtio: fix incorrect cast
> > 
> > Dustin Lundquist (1):
> >    examples/exception_path: align stats on cache line
> > 
> > Erez Ferber (1):
> >    net/mlx5: fix MTU update
> > 
> > Ferruh Yigit (1):
> >    kni: fix build with kernel 4.15
> > 
> > Fiona Trahe (1):
> >    crypto/qat: fix null auth algo overwrite
> > 
> > Gowrishankar Muthukrishnan (2):
> >    eal/ppc: remove the braces in memory barrier macros
> >    eal/ppc: support sPAPR IOMMU for vfio-pci
> > 
> > Harish Patil (2):
> >    net/qede: fix to reject config with no Rx queue
> >    net/qede/base: fix VF LRO tunnel configuration
> > 
> > Hemant Agrawal (4):
> >    pmdinfogen: fix cross compilation for ARM big endian
> >    lpm: fix ARM big endian build
> >    net/i40e: fix ARM big endian build
> >    net/ixgbe: fix ARM big endian build
> > 
> > Hyong Youb Kim (1):
> >    net/enic: fix crash due to static max number of queues
> > 
> > Igor Ryzhov (1):
> >    net/i40e: fix flag for MAC address write
> > 
> > Ilya V. Matveychikov (2):
> >    eal: update assertion macro
> >    mbuf: cleanup function to get last segment
> > 
> > Jerin Jacob (3):
> >    net/thunderx: fix multi segment Tx function return
> >    test/crypto: fix missing include
> >    ethdev: fix data alignment
> > 
> > Jerry Lilijun (1):
> >    net/bonding: fix activated slave in 8023ad mode
> > 
> > Jianfeng Tan (3):
> >    vhost: fix crash
> >    net/vhost: fix log messages on create/destroy
> >    net/virtio-user: fix start with kernel vhost
> > 
> > Junjie Chen (3):
> >    vhost: fix deq

Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2018-03-05 Thread Hunt, David

Hi BL,

I have always used "intel_pstate=disable" in my kernel parameters at 
boot so as to disable the intel_pstate driver, and force the kernel to 
use the acpi-cpufreq driver:


# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
acpi-cpufreq

This then gives me the following options for the governor:
['conservative', 'ondemand', 'userspace', 'powersave', 'performance', 
'schedutil']


Because DPDK threads typically poll, they appear as 100% busy to the 
p_state driver, so if you want to be able to change core frequency down 
(as in l3fwd-power), you need to use the acpi-cpufreq driver.


I had a read through the docs just now, and this does not seem to be 
mentioned, so I'll do up a patch to give some information on the correct 
kernel parameters to use when using the power library.


Regards,
Dave.

On 2/3/2018 7:20 AM, long...@viettel.com.vn wrote:

Forgot to link the original thread.

http://dpdk.org/ml/archives/dev/2016-January/030930.html

-BL


-Original Message-
From: long...@viettel.com.vn [mailto:long...@viettel.com.vn]
Sent: Friday, March 2, 2018 2:19 PM
To: dev@dpdk.org
Cc: david.h...@intel.com; mh...@mhcomputing.net; helin.zh...@intel.com;
long...@viettel.com.vn
Subject: librte_power w/ intel_pstate cpufreq governor

Hi everybody,

I know this thread was from over 2 years ago but I ran into the same

problem

with l3fwd-power today.

Any updates on this?

-BL






Re: [dpdk-dev] [PATCH] ethdev: return diagnostic when setting MAC address

2018-03-05 Thread Adrien Mazarguil
On Tue, Feb 27, 2018 at 04:11:29PM +0100, Olivier Matz wrote:
> Change the prototype and the behavior of dev_ops->eth_mac_addr_set(): a
> return code is added to notify the caller (librte_ether) if an error
> occurred in the PMD.
> 
> The new default MAC address is now copied in dev->data->mac_addrs[0]
> only if the operation is successful.
> 
> The patch also updates all the PMDs accordingly.
> 
> Signed-off-by: Olivier Matz 
> ---
> 
> Hi,
> 
> This patch is the following of the discussion we had in this thread:
> https://dpdk.org/dev/patchwork/patch/32284/
> 
> I did my best to keep the consistency inside the PMDs. The behavior
> of eth_mac_addr_set() is inspired from other fonctions in the same
> PMD, usually eth_mac_addr_add(). For instance:
> - dpaa and dpaa2 return 0 on error.
> - some PMDs (bnxt, mlx5, ...?) do not return a -errno code (-1 or
>   positive values).
> - some PMDs (avf, tap) check if the address is the same and return 0
>   in that case. This could go in generic code?
> 
> I tried to use the following errors when relevant:
> - -EPERM when a VF is not allowed to do a change
> - -ENOTSUP if the function is not supported
> - -EIO if this is an unknown error from lower layer (hw or sdk)

Keep in mind EIO is currently documented in ethdev as somewhat
hot-plug-related, as in "device is unresponsive and likely unplugged". The
reaction of a hot-plug-aware application to such an error code might be to
close the device, possibly for the wrong reason.

I just wanted to point it out, I don't think it's a problem for this patch
but can't speak for all PMDs.

> - -EINVAL for other unknown errors
> 
> Please, PMD maintainers, feel free to comment if you ahve specific
> needs for your driver.

OK with the API change and it's fine for mlx4 and mlx5, with a few comments
regarding the latter, please see below.


> diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
> index 19c8a223d..c107794ce 100644
> --- a/drivers/net/mlx4/mlx4.h
> +++ b/drivers/net/mlx4/mlx4.h
> @@ -131,7 +131,7 @@ void mlx4_allmulticast_disable(struct rte_eth_dev *dev);
>  void mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
>  int mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
> uint32_t index, uint32_t vmdq);
> -void mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr);
> +int mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr);
>  int mlx4_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on);
>  int mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
>  void mlx4_stats_reset(struct rte_eth_dev *dev);
> diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
> index 3bc692731..2442e16a6 100644
> --- a/drivers/net/mlx4/mlx4_ethdev.c
> +++ b/drivers/net/mlx4/mlx4_ethdev.c
> @@ -701,11 +701,14 @@ mlx4_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
> vlan_id, int on)
>   *   Pointer to Ethernet device structure.
>   * @param mac_addr
>   *   MAC address to register.
> + *
> + * @return
> + *   0 on success, negative errno value otherwise and rte_errno is set.
>   */
> -void
> +int
>  mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
>  {
> - mlx4_mac_addr_add(dev, mac_addr, 0, 0);
> + return mlx4_mac_addr_add(dev, mac_addr, 0, 0);
>  }
>  
>  /**
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
> index 965c19f21..42e58d7f7 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -241,7 +241,7 @@ int priv_get_mac(struct priv *, uint8_t 
> (*)[ETHER_ADDR_LEN]);
>  void mlx5_mac_addr_remove(struct rte_eth_dev *, uint32_t);
>  int mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
> uint32_t);
> -void mlx5_mac_addr_set(struct rte_eth_dev *, struct ether_addr *);
> +int mlx5_mac_addr_set(struct rte_eth_dev *, struct ether_addr *);
>  
>  /* mlx5_rss.c */
>  
> diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
> index e8a8d4594..0dc4bec46 100644
> --- a/drivers/net/mlx5/mlx5_mac.c
> +++ b/drivers/net/mlx5/mlx5_mac.c
> @@ -118,10 +118,13 @@ mlx5_mac_addr_add(struct rte_eth_dev *dev, struct 
> ether_addr *mac,
>   *   Pointer to Ethernet device structure.
>   * @param mac_addr
>   *   MAC address to register.
> + *
> + * @return
> + *   0 on success.
>   */
> -void
> +int
>  mlx5_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
>  {
>   DEBUG("%p: setting primary MAC address", (void *)dev);
> - mlx5_mac_addr_add(dev, mac_addr, 0, 0);
> + return mlx5_mac_addr_add(dev, mac_addr, 0, 0);
>  }

With Nelio's errno rework for mlx5 [1][2], this change should end up being
similar to mlx4.

[1] http://dpdk.org/ml/archives/dev/2018-February/091668.html
[2] http://dpdk.org/ml/archives/dev/2018-February/091678.html

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH 02/18] app/testpmd: support flow RSS level parsing

2018-03-05 Thread Xueming(Steven) Li
Thanks for reminding, I'll update in next version.

> -Original Message-
> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
> Sent: Tuesday, February 27, 2018 9:10 PM
> To: Xueming(Steven) Li ; Wenzhuo Lu
> ; Jingjing Wu ; Thomas
> Monjalon ; Nélio Laranjeiro
> ; Adrien Mazarguil
> ; Shahaf Shuler 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 02/18] app/testpmd: support flow RSS level
> parsing
> 
> On 2/26/2018 3:09 PM, Xueming Li wrote:
> > Support new flow RSS level parameter to select inner or outer RSS
> > fields. Example:
> >
> >   flow create 0 ingress pattern eth  / ipv4 / udp dst is 4789 / vxlan
> > / end actions rss queues 1 2 end level 1 / end
> >
> > Signed-off-by: Xueming Li 
> > ---
> >  app/test-pmd/cmdline_flow.c | 27 +--
> 
> Isn't there any document file to update for this new parameter?



Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2018-03-05 Thread longtb5
Hi Dave,

Actually in my test lab which is a HP box running CentOS 7 on kernel version
3.10.0-693.5.2.el7.x86_64, the default cpufreq driver is pcc_cpufreq. So I 
guess 
disabling intel_pstate wouldn't help in my case.

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
pcc-cpufreq

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors 
conservative userspace powersave ondemand performance

According to kernel doc, pcc_cpufreq also doesn't export 
scaling_availabe_frequencies
in sysfs.

>From kernel doc:
"scaling_available_frequencies is not created in /sys. No intermediate
frequencies need to be listed because the BIOS will try to achieve any
frequency, within limits, requested by the governor. A frequency does not have
to be strictly associated with a P-state."

The lack of scaling_availabe_frequencies makes power_acpi_cpufreq_init() 
complains, similar to the problem with intel_pstate as  in the other thread.
I have tried (though with not much effort) to force the kernel
to use acpi-cpufreq instead but without success.

Luckily, as quoted above pcc_cpufreq supports setting of arbitrary frequency, 
so a simple workaround for now is to fake a scaling_available_frequencies file
in another directory, then edit the code in librte_power to use that file 
instead.

Regards,
-BL

> -Original Message-
> From: david.h...@intel.com [mailto:david.h...@intel.com]
> Sent: Monday, March 5, 2018 5:16 PM
> To: long...@viettel.com.vn; dev@dpdk.org
> Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor
> 
> Hi BL,
> 
> I have always used "intel_pstate=disable" in my kernel parameters at boot so
> as to disable the intel_pstate driver, and force the kernel to use the acpi-
> cpufreq driver:
> 
> # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
> acpi-cpufreq
> 
> This then gives me the following options for the governor:
> ['conservative', 'ondemand', 'userspace', 'powersave', 'performance',
> 'schedutil']
> 
> Because DPDK threads typically poll, they appear as 100% busy to the p_state
> driver, so if you want to be able to change core frequency down (as in l3fwd-
> power), you need to use the acpi-cpufreq driver.
> 
> I had a read through the docs just now, and this does not seem to be
> mentioned, so I'll do up a patch to give some information on the correct
> kernel parameters to use when using the power library.
> 
> Regards,
> Dave.
> 
> On 2/3/2018 7:20 AM, long...@viettel.com.vn wrote:
> > Forgot to link the original thread.
> >
> > http://dpdk.org/ml/archives/dev/2016-January/030930.html
> >
> > -BL
> >
> >> -Original Message-
> >> From: long...@viettel.com.vn [mailto:long...@viettel.com.vn]
> >> Sent: Friday, March 2, 2018 2:19 PM
> >> To: dev@dpdk.org
> >> Cc: david.h...@intel.com; mh...@mhcomputing.net;
> >> helin.zh...@intel.com; long...@viettel.com.vn
> >> Subject: librte_power w/ intel_pstate cpufreq governor
> >>
> >> Hi everybody,
> >>
> >> I know this thread was from over 2 years ago but I ran into the same
> > problem
> >> with l3fwd-power today.
> >>
> >> Any updates on this?
> >>
> >> -BL
> >



Re: [dpdk-dev] [PATCH 32/41] crypto/qat: use contiguous allocation for DMA memory

2018-03-05 Thread Trahe, Fiona


> -Original Message-
> From: Burakov, Anatoly
> Sent: Saturday, March 3, 2018 1:46 PM
> To: dev@dpdk.org
> Cc: Griffin, John ; Trahe, Fiona 
> ; Jain, Deepak K
> ; Wiles, Keith ; Tan, Jianfeng
> ; andras.kov...@ericsson.com; 
> laszlo.vadk...@ericsson.com; Walker,
> Benjamin ; Richardson, Bruce 
> ;
> tho...@monjalon.net; Ananyev, Konstantin ; 
> Ramakrishnan,
> Kuralamudhan ; Daly, Louise M 
> ;
> nelio.laranje...@6wind.com; ys...@mellanox.com; peppe...@japf.ch;
> jerin.ja...@caviumnetworks.com; hemant.agra...@nxp.com; olivier.m...@6wind.com
> Subject: [PATCH 32/41] crypto/qat: use contiguous allocation for DMA memory
> 
> Signed-off-by: Anatoly Burakov 
Acked-by: Fiona Trahe 


Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/7] ethdev: fix port data reset timing

2018-03-05 Thread Ferruh Yigit
On 1/18/2018 4:35 PM, Matan Azrad wrote:
> rte_eth_dev_data structure is allocated per ethdev port and can be
> used to get a data of the port internally.
> 
> rte_eth_dev_attach_secondary tries to find the port identifier using
> rte_eth_dev_data name field comparison and may get an identifier of
> invalid port in case of this port was released by the primary process
> because the port release API doesn't reset the port data.
> 
> So, it will be better to reset the port data in release time instead of
> allocation time.
> 
> Move the port data reset to the port release API.
> 
> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process 
> model")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Matan Azrad 
> ---
>  lib/librte_ether/rte_ethdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 7044159..156231c 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -204,7 +204,6 @@ struct rte_eth_dev *
>   return NULL;
>   }
>  
> - memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
>   eth_dev = eth_dev_get(port_id);
>   snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
>   eth_dev->data->port_id = port_id;
> @@ -252,6 +251,7 @@ struct rte_eth_dev *
>   if (eth_dev == NULL)
>   return -EINVAL;
>  
> + memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));

Hi Matan,

What most of the vdev release path does is:

eth_dev = rte_eth_dev_allocated(...)
rte_free(eth_dev->data->dev_private);
rte_free(eth_dev->data);
rte_eth_dev_release_port(eth_dev);

Since eth_dev->data freed, memset() it in rte_eth_dev_release_port() will be
problem.

We don't run remove path that is why we didn't hit the issue but this seems
problem for all virtual PMDs.
Also rte_eth_dev_pci_release() looks problematic now.

Can you please check the issue?


Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2018-03-05 Thread Hunt, David


Hi BL,


On 5/3/2018 10:48 AM, long...@viettel.com.vn wrote:

Hi Dave,

Actually in my test lab which is a HP box running CentOS 7 on kernel version
3.10.0-693.5.2.el7.x86_64, the default cpufreq driver is pcc_cpufreq. So I guess
disabling intel_pstate wouldn't help in my case.

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
pcc-cpufreq

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
conservative userspace powersave ondemand performance

According to kernel doc, pcc_cpufreq also doesn't export 
scaling_availabe_frequencies
in sysfs.

 From kernel doc:
"scaling_available_frequencies is not created in /sys. No intermediate
frequencies need to be listed because the BIOS will try to achieve any
frequency, within limits, requested by the governor. A frequency does not have
to be strictly associated with a P-state."

The lack of scaling_availabe_frequencies makes power_acpi_cpufreq_init()
complains, similar to the problem with intel_pstate as  in the other thread.
I have tried (though with not much effort) to force the kernel
to use acpi-cpufreq instead but without success.

Luckily, as quoted above pcc_cpufreq supports setting of arbitrary frequency,
so a simple workaround for now is to fake a scaling_available_frequencies file
in another directory, then edit the code in librte_power to use that file 
instead.

Regards,
-BL


-Original Message-
From: david.h...@intel.com [mailto:david.h...@intel.com]
Sent: Monday, March 5, 2018 5:16 PM
To: long...@viettel.com.vn; dev@dpdk.org
Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor

Hi BL,

I have always used "intel_pstate=disable" in my kernel parameters at boot so
as to disable the intel_pstate driver, and force the kernel to use the acpi-
cpufreq driver:

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
acpi-cpufreq

This then gives me the following options for the governor:
['conservative', 'ondemand', 'userspace', 'powersave', 'performance',
'schedutil']

Because DPDK threads typically poll, they appear as 100% busy to the p_state
driver, so if you want to be able to change core frequency down (as in l3fwd-
power), you need to use the acpi-cpufreq driver.

I had a read through the docs just now, and this does not seem to be
mentioned, so I'll do up a patch to give some information on the correct
kernel parameters to use when using the power library.

Regards,
Dave.

On 2/3/2018 7:20 AM, long...@viettel.com.vn wrote:

Forgot to link the original thread.

http://dpdk.org/ml/archives/dev/2016-January/030930.html

-BL


-Original Message-
From: long...@viettel.com.vn [mailto:long...@viettel.com.vn]
Sent: Friday, March 2, 2018 2:19 PM
To: dev@dpdk.org
Cc: david.h...@intel.com; mh...@mhcomputing.net;
helin.zh...@intel.com; long...@viettel.com.vn
Subject: librte_power w/ intel_pstate cpufreq governor

Hi everybody,

I know this thread was from over 2 years ago but I ran into the same

problem

with l3fwd-power today.

Any updates on this?

-BL


Good to hear you found a workaround.

So the issue really is "Getting the Power Library working with the 
ppc-cpufreq kernel driver" :)


From wiki.archlinux.org:
ppc-cpufreq: his driver supports Processor Clocking Control interface by 
Hewlett-Packard and Microsoft Corporation which is useful on some 
ProLiant servers.


In the following doc: 
https://www.kernel.org/doc/Documentation/cpu-freq/pcc-cpufreq.txt
it mentions - "When PCC mode is enabled, the platform will not expose 
processor performance or throttle states (_PSS, _TSS and related ACPI 
objects) to OSPM. Therefore,the native P-state driver (such as 
acpi-cpufreq for Intel, powernow-k8 forAMD) will not load".
Is there a way to disable PPC mode in the BIOS on that server? From that 
wording, it seems to imply imply that there is a way to disable PPC 
(seeing that it can be enabled).


If you can't disbale PPC, I would suggest that a patch may be needed to 
allow the power library detect if it's using acpi or ppc, and obtain a 
list of cpu frequencies accordingly. However, I don't have any HP 
servers available to me, so I'm currently unable to research a method of 
getting a list of valid cpu frequencies on a machine using the ppc driver.


If you come up with a snippet of code for listing available frequencies 
on that server, let me know and we can look at adding that into the 
power library. :)


Regards,
Dave.






[dpdk-dev] [PATCH v3 02/10] net/mlx5: name parameters in function prototypes

2018-03-05 Thread Nelio Laranjeiro
Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.h  | 191 ---
 drivers/net/mlx5/mlx5_rxtx.h | 162 
 2 files changed, 195 insertions(+), 158 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5e90d99cc..b65962df9 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -206,113 +206,132 @@ int mlx5_getenv_int(const char *);
 
 struct priv *mlx5_get_priv(struct rte_eth_dev *dev);
 int mlx5_is_secondary(void);
-int priv_get_ifname(const struct priv *, char (*)[IF_NAMESIZE]);
-int priv_ifreq(const struct priv *, int req, struct ifreq *);
-int priv_get_mtu(struct priv *, uint16_t *);
-int priv_set_flags(struct priv *, unsigned int, unsigned int);
-int mlx5_dev_configure(struct rte_eth_dev *);
-void mlx5_dev_infos_get(struct rte_eth_dev *, struct rte_eth_dev_info *);
+int priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE]);
+int priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr);
+int priv_get_mtu(struct priv *priv, uint16_t *mtu);
+int priv_set_flags(struct priv *priv, unsigned int keep, unsigned int flags);
+int mlx5_dev_configure(struct rte_eth_dev *dev);
+void mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info 
*info);
 const uint32_t *mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev);
-int priv_link_update(struct priv *, int);
-int priv_force_link_status_change(struct priv *, int);
-int mlx5_link_update(struct rte_eth_dev *, int);
-int mlx5_dev_set_mtu(struct rte_eth_dev *, uint16_t);
-int mlx5_dev_get_flow_ctrl(struct rte_eth_dev *, struct rte_eth_fc_conf *);
-int mlx5_dev_set_flow_ctrl(struct rte_eth_dev *, struct rte_eth_fc_conf *);
-int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
-   struct rte_pci_addr *);
-void mlx5_dev_link_status_handler(void *);
-void mlx5_dev_interrupt_handler(void *);
-void priv_dev_interrupt_handler_uninstall(struct priv *, struct rte_eth_dev *);
-void priv_dev_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
+int priv_link_update(struct priv *priv, int wait_to_complete);
+int priv_force_link_status_change(struct priv *priv, int status);
+int mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete);
+int mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu);
+int mlx5_dev_get_flow_ctrl(struct rte_eth_dev *dev,
+  struct rte_eth_fc_conf *fc_conf);
+int mlx5_dev_set_flow_ctrl(struct rte_eth_dev *dev,
+  struct rte_eth_fc_conf *fc_conf);
+int mlx5_ibv_device_to_pci_addr(const struct ibv_device *device,
+   struct rte_pci_addr *pci_addr);
+void mlx5_dev_link_status_handler(void *arg);
+void mlx5_dev_interrupt_handler(void *cb_arg);
+void priv_dev_interrupt_handler_uninstall(struct priv *priv,
+ struct rte_eth_dev *dev);
+void priv_dev_interrupt_handler_install(struct priv *priv,
+   struct rte_eth_dev *dev);
 int mlx5_set_link_down(struct rte_eth_dev *dev);
 int mlx5_set_link_up(struct rte_eth_dev *dev);
+eth_tx_burst_t priv_select_tx_function(struct priv *priv,
+  struct rte_eth_dev *dev);
+eth_rx_burst_t priv_select_rx_function(struct priv *priv,
+  struct rte_eth_dev *dev);
 int mlx5_is_removed(struct rte_eth_dev *dev);
-eth_tx_burst_t priv_select_tx_function(struct priv *, struct rte_eth_dev *);
-eth_rx_burst_t priv_select_rx_function(struct priv *, struct rte_eth_dev *);
 
 /* mlx5_mac.c */
 
-int priv_get_mac(struct priv *, uint8_t (*)[ETHER_ADDR_LEN]);
-void mlx5_mac_addr_remove(struct rte_eth_dev *, uint32_t);
-int mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
- uint32_t);
-void mlx5_mac_addr_set(struct rte_eth_dev *, struct ether_addr *);
+int priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN]);
+void mlx5_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
+int mlx5_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac,
+ uint32_t index, uint32_t vmdq);
+void mlx5_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr);
 
 /* mlx5_rss.c */
 
-int mlx5_rss_hash_update(struct rte_eth_dev *, struct rte_eth_rss_conf *);
-int mlx5_rss_hash_conf_get(struct rte_eth_dev *, struct rte_eth_rss_conf *);
-int priv_rss_reta_index_resize(struct priv *, unsigned int);
-int mlx5_dev_rss_reta_query(struct rte_eth_dev *,
-   struct rte_eth_rss_reta_entry64 *, uint16_t);
-int mlx5_dev_rss_reta_update(struct rte_eth_dev *,
-struct rte_eth_rss_reta_entry64 *, uint16_t);
+int mlx5_rss_hash_update(struct rte_eth_dev *dev,
+struct rte_eth_rss_conf *rss_conf);
+int mlx5_rss_hash_conf_get(struct rte_eth_dev *dev,
+

[dpdk-dev] [PATCH v3 03/10] net/mlx5: mark parameters with unused attribute

2018-03-05 Thread Nelio Laranjeiro
Replaces all (void)foo; by __rte_unused macro except when variables are
under #if statements.

Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c |  4 ++--
 drivers/net/mlx5/mlx5_ethdev.c  | 18 +--
 drivers/net/mlx5/mlx5_flow.c| 25 
 drivers/net/mlx5/mlx5_mac.c |  3 +--
 drivers/net/mlx5/mlx5_mr.c  | 10 +++-
 drivers/net/mlx5/mlx5_rxq.c |  4 ++--
 drivers/net/mlx5/mlx5_rxtx.c| 51 +
 drivers/net/mlx5/mlx5_stats.c   |  2 +-
 drivers/net/mlx5/mlx5_trigger.c |  4 ++--
 drivers/net/mlx5/mlx5_txq.c | 19 +++
 10 files changed, 55 insertions(+), 85 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7e8a214ce..cdf99b5ad 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -571,7 +571,8 @@ priv_uar_init_secondary(struct priv *priv)
  *   0 on success, negative errno value on failure.
  */
 static int
-mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+  struct rte_pci_device *pci_dev)
 {
struct ibv_device **list;
struct ibv_device *ibv_dev;
@@ -588,7 +589,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
struct ibv_counter_set_description cs_desc;
 #endif
 
-   (void)pci_drv;
assert(pci_drv == &mlx5_driver);
/* Get mlx5_dev[] index. */
idx = mlx5_dev_idx(&pci_dev->addr);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index f98fc4c3b..0c383deba 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -467,11 +467,9 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev)
  *
  * @param dev
  *   Pointer to Ethernet device structure.
- * @param wait_to_complete
- *   Wait for request completion (ignored).
  */
 static int
-mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev, int wait_to_complete)
+mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev)
 {
struct priv *priv = dev->data->dev_private;
struct ethtool_cmd edata = {
@@ -483,7 +481,6 @@ mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev, int 
wait_to_complete)
 
/* priv_lock() is not taken to allow concurrent calls. */
 
-   (void)wait_to_complete;
if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno));
return -1;
@@ -533,11 +530,9 @@ mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev, 
int wait_to_complete)
  *
  * @param dev
  *   Pointer to Ethernet device structure.
- * @param wait_to_complete
- *   Wait for request completion (ignored).
  */
 static int
-mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev, int wait_to_complete)
+mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev)
 {
struct priv *priv = dev->data->dev_private;
struct ethtool_link_settings gcmd = { .cmd = ETHTOOL_GLINKSETTINGS };
@@ -545,7 +540,6 @@ mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev, int 
wait_to_complete)
struct rte_eth_link dev_link;
uint64_t sc;
 
-   (void)wait_to_complete;
if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno));
return -1;
@@ -675,7 +669,7 @@ priv_link_stop(struct priv *priv)
  *   Wait for request completion (ignored).
  */
 int
-priv_link_update(struct priv *priv, int wait_to_complete)
+priv_link_update(struct priv *priv, int wait_to_complete __rte_unused)
 {
struct rte_eth_dev *dev = priv->dev;
struct utsname utsname;
@@ -687,9 +681,9 @@ priv_link_update(struct priv *priv, int wait_to_complete)
sscanf(utsname.release, "%d.%d.%d",
   &ver[0], &ver[1], &ver[2]) != 3 ||
KERNEL_VERSION(ver[0], ver[1], ver[2]) < KERNEL_VERSION(4, 9, 0))
-   ret = mlx5_link_update_unlocked_gset(dev, wait_to_complete);
+   ret = mlx5_link_update_unlocked_gset(dev);
else
-   ret = mlx5_link_update_unlocked_gs(dev, wait_to_complete);
+   ret = mlx5_link_update_unlocked_gs(dev);
/* If lsc interrupt is disabled, should always be ready for traffic. */
if (!dev->data->dev_conf.intr_conf.lsc) {
priv_link_start(priv);
@@ -741,7 +735,7 @@ priv_force_link_status_change(struct priv *priv, int status)
  *   Wait for request completion (ignored).
  */
 int
-mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete)
+mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete __rte_unused)
 {
struct priv *priv = dev->data->dev_private;
int ret;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 42381c578..bb98fb4c5 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -526,7 +

[dpdk-dev] [PATCH v3 01/10] net/mlx5: fix sriov flag

2018-03-05 Thread Nelio Laranjeiro
priv_get_num_vfs() was used to help the PMD in prefetching the mbuf in
datapath when the PMD was behaving in VF mode.
This knowledge is no more used.

Fixes: 528a9fbec6de ("net/mlx5: support ConnectX-5 devices")
Cc: ys...@mellanox.com

Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c| 18 ++
 drivers/net/mlx5/mlx5.h|  2 --
 drivers/net/mlx5/mlx5_ethdev.c | 37 -
 3 files changed, 2 insertions(+), 55 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 61cb93101..7e8a214ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -578,7 +578,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
int err = 0;
struct ibv_context *attr_ctx = NULL;
struct ibv_device_attr_ex device_attr;
-   unsigned int sriov;
unsigned int mps;
unsigned int cqe_comp;
unsigned int tunnel_en = 0;
@@ -625,18 +624,8 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
(pci_dev->addr.devid != pci_addr.devid) ||
(pci_dev->addr.function != pci_addr.function))
continue;
-   sriov = ((pci_dev->id.device_id ==
-  PCI_DEVICE_ID_MELLANOX_CONNECTX4VF) ||
- (pci_dev->id.device_id ==
-  PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF) ||
- (pci_dev->id.device_id ==
-  PCI_DEVICE_ID_MELLANOX_CONNECTX5VF) ||
- (pci_dev->id.device_id ==
-  PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF));
-   INFO("PCI information matches, using device \"%s\""
-" (SR-IOV: %s)",
-list[i]->name,
-sriov ? "true" : "false");
+   INFO("PCI information matches, using device \"%s\"",
+list[i]->name);
attr_ctx = mlx5_glue->open_device(list[i]);
err = errno;
break;
@@ -709,7 +698,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
struct rte_eth_dev *eth_dev;
struct ibv_device_attr_ex device_attr_ex;
struct ether_addr mac;
-   uint16_t num_vfs = 0;
struct ibv_device_attr_ex device_attr;
struct mlx5_dev_config config = {
.cqe_comp = cqe_comp,
@@ -870,8 +858,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
DEBUG("hardware RX end alignment padding is %ssupported",
  (config.hw_padding ? "" : "not "));
 
-   priv_get_num_vfs(priv, &num_vfs);
-   config.sriov = (num_vfs || sriov);
config.tso = ((device_attr_ex.tso_caps.max_tso > 0) &&
  (device_attr_ex.tso_caps.supported_qpts &
  (1 << IBV_QPT_RAW_PACKET)));
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9ad0533fc..5e90d99cc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -78,7 +78,6 @@ struct mlx5_dev_config {
unsigned int hw_vlan_strip:1; /* VLAN stripping is supported. */
unsigned int hw_fcs_strip:1; /* FCS stripping is supported. */
unsigned int hw_padding:1; /* End alignment padding is supported. */
-   unsigned int sriov:1; /* This is a VF or PF with VF devices. */
unsigned int mps:2; /* Multi-packet send supported mode. */
unsigned int tunnel_en:1;
/* Whether tunnel stateless offloads are supported. */
@@ -209,7 +208,6 @@ struct priv *mlx5_get_priv(struct rte_eth_dev *dev);
 int mlx5_is_secondary(void);
 int priv_get_ifname(const struct priv *, char (*)[IF_NAMESIZE]);
 int priv_ifreq(const struct priv *, int req, struct ifreq *);
-int priv_get_num_vfs(struct priv *, uint16_t *);
 int priv_get_mtu(struct priv *, uint16_t *);
 int priv_set_flags(struct priv *, unsigned int, unsigned int);
 int mlx5_dev_configure(struct rte_eth_dev *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index b73cb53df..f98fc4c3b 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -201,43 +201,6 @@ priv_ifreq(const struct priv *priv, int req, struct ifreq 
*ifr)
 }
 
 /**
- * Return the number of active VFs for the current device.
- *
- * @param[in] priv
- *   Pointer to private structure.
- * @param[out] num_vfs
- *   Number of active VFs.
- *
- * @return
- *   0 on success, -1 on failure and errno is set.
- */
-int
-priv_get_num_vfs(struct priv *priv, uint16_t *num_vfs)
-{
-   /* The sysfs entry name depends on the operating system. */
-   const char **name = (const char *[]){
-   "sriov_numvfs",
-   "mlx5_num_vfs",
-   NULL,
-   };

[dpdk-dev] [PATCH v3 04/10] net/mlx5: normalize function prototypes

2018-03-05 Thread Nelio Laranjeiro
Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_flow.c |  2 +-
 drivers/net/mlx5/mlx5_mr.c   | 11 ++-
 drivers/net/mlx5/mlx5_rxq.c  | 16 
 drivers/net/mlx5/mlx5_txq.c  |  8 
 4 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index bb98fb4c5..d8d124749 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -866,7 +866,7 @@ priv_flow_convert_items_validate(struct priv *priv 
__rte_unused,
  * @return
  *   A verbs flow attribute on success, NULL otherwise.
  */
-static struct ibv_flow_attr*
+static struct ibv_flow_attr *
 priv_flow_convert_allocate(struct priv *priv __rte_unused,
   unsigned int priority,
   unsigned int size,
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 38a8e2f40..4e1495800 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -66,8 +66,9 @@ mlx5_check_mempool_cb(struct rte_mempool *mp __rte_unused,
  * @return
  *   0 on success (mempool is virtually contiguous), -1 on error.
  */
-static int mlx5_check_mempool(struct rte_mempool *mp, uintptr_t *start,
-   uintptr_t *end)
+static int
+mlx5_check_mempool(struct rte_mempool *mp, uintptr_t *start,
+  uintptr_t *end)
 {
struct mlx5_check_mempool_data data;
 
@@ -97,7 +98,7 @@ static int mlx5_check_mempool(struct rte_mempool *mp, 
uintptr_t *start,
  * @return
  *   mr on success, NULL on failure.
  */
-struct mlx5_mr*
+struct mlx5_mr *
 priv_txq_mp2mr_reg(struct priv *priv, struct mlx5_txq_data *txq,
   struct rte_mempool *mp, unsigned int idx)
 {
@@ -244,7 +245,7 @@ mlx5_mp2mr_iter(struct rte_mempool *mp, void *arg)
  * @return
  *   The memory region on success.
  */
-struct mlx5_mr*
+struct mlx5_mr *
 priv_mr_new(struct priv *priv, struct rte_mempool *mp)
 {
const struct rte_memseg *ms = rte_eal_get_physmem_layout();
@@ -304,7 +305,7 @@ priv_mr_new(struct priv *priv, struct rte_mempool *mp)
  * @return
  *   The memory region on success.
  */
-struct mlx5_mr*
+struct mlx5_mr *
 priv_mr_get(struct priv *priv, struct rte_mempool *mp)
 {
struct mlx5_mr *mr;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 8b9cc1dd0..2fc6e08aa 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -601,7 +601,7 @@ mlx5_rx_intr_disable(struct rte_eth_dev *dev, uint16_t 
rx_queue_id)
  * @return
  *   The Verbs object initialised if it can be created.
  */
-struct mlx5_rxq_ibv*
+struct mlx5_rxq_ibv *
 mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
 {
struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
@@ -819,7 +819,7 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
  * @return
  *   The Verbs object if it exists.
  */
-struct mlx5_rxq_ibv*
+struct mlx5_rxq_ibv *
 mlx5_priv_rxq_ibv_get(struct priv *priv, uint16_t idx)
 {
struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
@@ -932,7 +932,7 @@ mlx5_priv_rxq_ibv_releasable(struct priv *priv __rte_unused,
  * @return
  *   A DPDK queue object on success.
  */
-struct mlx5_rxq_ctrl*
+struct mlx5_rxq_ctrl *
 mlx5_priv_rxq_new(struct priv *priv, uint16_t idx, uint16_t desc,
  unsigned int socket, const struct rte_eth_rxconf *conf,
  struct rte_mempool *mp)
@@ -1057,7 +1057,7 @@ mlx5_priv_rxq_new(struct priv *priv, uint16_t idx, 
uint16_t desc,
  * @return
  *   A pointer to the queue if it exists.
  */
-struct mlx5_rxq_ctrl*
+struct mlx5_rxq_ctrl *
 mlx5_priv_rxq_get(struct priv *priv, uint16_t idx)
 {
struct mlx5_rxq_ctrl *rxq_ctrl = NULL;
@@ -1170,7 +1170,7 @@ mlx5_priv_rxq_verify(struct priv *priv)
  * @return
  *   A new indirection table.
  */
-struct mlx5_ind_table_ibv*
+struct mlx5_ind_table_ibv *
 mlx5_priv_ind_table_ibv_new(struct priv *priv, uint16_t queues[],
uint16_t queues_n)
 {
@@ -1232,7 +1232,7 @@ mlx5_priv_ind_table_ibv_new(struct priv *priv, uint16_t 
queues[],
  * @return
  *   An indirection table if found.
  */
-struct mlx5_ind_table_ibv*
+struct mlx5_ind_table_ibv *
 mlx5_priv_ind_table_ibv_get(struct priv *priv, uint16_t queues[],
uint16_t queues_n)
 {
@@ -1331,7 +1331,7 @@ mlx5_priv_ind_table_ibv_verify(struct priv *priv)
  * @return
  *   An hash Rx queue on success.
  */
-struct mlx5_hrxq*
+struct mlx5_hrxq *
 mlx5_priv_hrxq_new(struct priv *priv, uint8_t *rss_key, uint8_t rss_key_len,
   uint64_t hash_fields, uint16_t queues[], uint16_t queues_n)
 {
@@ -1400,7 +1400,7 @@ mlx5_priv_hrxq_new(struct priv *priv, uint8_t *rss_key, 
uint8_t rss_key_len,
  * @return
  *   An hash Rx queue on success.
  */
-struct mlx5_hrxq*
+struct mlx5_hrxq *
 mlx5_priv_hrxq_get(struct priv *priv, uint8_t *rss_key, uint8_t rss_key_len,
   uint64_t hash_fields, uint16_t queues[], u

[dpdk-dev] [PATCH v3 00/10] net/mlx5: clean driver

2018-03-05 Thread Nelio Laranjeiro
- Removes unused SR-IOV flag.
- Adds missing documentation on some functions.
- Removes the spin-lock on the private structure.
- Standardize the return values of all functions as discussed on the mailing
  list [1].

[1] https://dpdk.org/ml/archives/dev/2018-January/087991.html

Changes in v2:

 - fix a segfault in Tx queue release.

Nelio Laranjeiro (10):
  net/mlx5: fix sriov flag
  net/mlx5: name parameters in function prototypes
  net/mlx5: mark parameters with unused attribute
  net/mlx5: normalize function prototypes
  net/mlx5: add missing function documentation
  net/mlx5: remove useless empty lines
  net/mlx5: remove control path locks
  net/mlx5: prefix all function with mlx5
  net/mlx5: change non failing function return values
  net/mlx5: standardize on negative errno values

 drivers/net/mlx5/mlx5.c  | 236 ++
 drivers/net/mlx5/mlx5.h  | 240 ++
 drivers/net/mlx5/mlx5_ethdev.c   | 611 +++
 drivers/net/mlx5/mlx5_flow.c | 664 ---
 drivers/net/mlx5/mlx5_mac.c  |  42 ++-
 drivers/net/mlx5/mlx5_mr.c   | 130 
 drivers/net/mlx5/mlx5_rss.c  | 159 --
 drivers/net/mlx5/mlx5_rxmode.c   |  28 +-
 drivers/net/mlx5/mlx5_rxq.c  | 488 ++--
 drivers/net/mlx5/mlx5_rxtx.c |  49 ++-
 drivers/net/mlx5/mlx5_rxtx.h | 161 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c |  25 +-
 drivers/net/mlx5/mlx5_socket.c   | 115 ---
 drivers/net/mlx5/mlx5_stats.c| 189 +--
 drivers/net/mlx5/mlx5_trigger.c  | 234 +++---
 drivers/net/mlx5/mlx5_txq.c  | 229 +++---
 drivers/net/mlx5/mlx5_vlan.c |  93 ++
 17 files changed, 1761 insertions(+), 1932 deletions(-)

-- 
2.11.0



[dpdk-dev] [PATCH v3 06/10] net/mlx5: remove useless empty lines

2018-03-05 Thread Nelio Laranjeiro
Some empty lines have been added in the middle of the code without any
reason.  This commit removes them.

Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c| 22 --
 drivers/net/mlx5/mlx5_ethdev.c |  7 ---
 drivers/net/mlx5/mlx5_mr.c |  1 -
 drivers/net/mlx5/mlx5_rss.c|  2 --
 drivers/net/mlx5/mlx5_rxq.c|  1 -
 drivers/net/mlx5/mlx5_vlan.c   |  6 --
 6 files changed, 39 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cdf99b5ad..91149ccee 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -597,7 +597,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
return -ENOMEM;
}
DEBUG("using driver device index %d", idx);
-
/* Save PCI address. */
mlx5_dev[idx].pci_addr = pci_dev->addr;
list = mlx5_glue->get_device_list(&i);
@@ -644,7 +643,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
return -err;
}
ibv_dev = list[i];
-
DEBUG("device opened");
/*
 * Multi-packet send is supported by ConnectX-4 Lx PF as well
@@ -685,7 +683,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
if (mlx5_glue->query_device_ex(attr_ctx, NULL, &device_attr))
goto error;
INFO("%u port(s) detected", device_attr.orig_attr.phys_port_cnt);
-
for (i = 0; i < device_attr.orig_attr.phys_port_cnt; i++) {
char name[RTE_ETH_NAME_MAX_LEN];
int len;
@@ -716,9 +713,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 pci_dev->addr.devid, pci_dev->addr.function);
if (device_attr.orig_attr.phys_port_cnt > 1)
snprintf(name + len, sizeof(name), " port %u", i);
-
mlx5_dev[idx].ports |= test;
-
if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
eth_dev = rte_eth_dev_attach_secondary(name);
if (eth_dev == NULL) {
@@ -755,15 +750,12 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv 
__rte_unused,
priv_select_tx_function(priv, eth_dev);
continue;
}
-
DEBUG("using port %u (%08" PRIx32 ")", port, test);
-
ctx = mlx5_glue->open_device(ibv_dev);
if (ctx == NULL) {
err = ENODEV;
goto port_error;
}
-
mlx5_glue->query_device_ex(ctx, NULL, &device_attr);
/* Check port status. */
err = mlx5_glue->query_port(ctx, port, &port_attr);
@@ -771,19 +763,16 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv 
__rte_unused,
ERROR("port query failed: %s", strerror(err));
goto port_error;
}
-
if (port_attr.link_layer != IBV_LINK_LAYER_ETHERNET) {
ERROR("port %d is not configured in Ethernet mode",
  port);
err = EINVAL;
goto port_error;
}
-
if (port_attr.state != IBV_PORT_ACTIVE)
DEBUG("port %d is not active: \"%s\" (%d)",
  port, mlx5_glue->port_state_str(port_attr.state),
  port_attr.state);
-
/* Allocate protection domain. */
pd = mlx5_glue->alloc_pd(ctx);
if (pd == NULL) {
@@ -791,9 +780,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
err = ENOMEM;
goto port_error;
}
-
mlx5_dev[idx].ports |= test;
-
/* from rte_ethdev.c */
priv = rte_zmalloc("ethdev private structure",
   sizeof(*priv),
@@ -803,7 +790,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
err = ENOMEM;
goto port_error;
}
-
priv->ctx = ctx;
strncpy(priv->ibdev_path, priv->ctx->device->ibdev_path,
sizeof(priv->ibdev_path));
@@ -821,7 +807,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
ERROR("ibv_query_device_ex() failed");
goto port_error;
}
-
config.hw_csum = !!(device_attr_ex.device_cap_flags_ex &
IBV_DEVICE_RAW_IP_CSUM);
DEBUG("checksum offloading is %ssupported",
@@ -857,7 +842,6 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 #endif
DEBUG("hardware RX end alignment padding is %ssupported",
  (config.hw_padding ? "" : "not "));
-
config.

[dpdk-dev] [PATCH v3 05/10] net/mlx5: add missing function documentation

2018-03-05 Thread Nelio Laranjeiro
Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_ethdev.c  | 18 ++
 drivers/net/mlx5/mlx5_mr.c  |  7 +--
 drivers/net/mlx5/mlx5_rxq.c | 20 
 drivers/net/mlx5/mlx5_trigger.c | 30 ++
 drivers/net/mlx5/mlx5_txq.c | 10 ++
 5 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 0c383deba..9bbf1eb7d 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -435,6 +435,15 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *info)
priv_unlock(priv);
 }
 
+/**
+ * Get supported packet types.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   A pointer to the supported Packet types array.
+ */
 const uint32_t *
 mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev)
 {
@@ -467,6 +476,9 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev)
  *
  * @param dev
  *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, -1 on error.
  */
 static int
 mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev)
@@ -530,6 +542,9 @@ mlx5_link_update_unlocked_gset(struct rte_eth_dev *dev)
  *
  * @param dev
  *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, -1 on error.
  */
 static int
 mlx5_link_update_unlocked_gs(struct rte_eth_dev *dev)
@@ -733,6 +748,9 @@ priv_force_link_status_change(struct priv *priv, int status)
  *   Pointer to Ethernet device structure.
  * @param wait_to_complete
  *   Wait for request completion (ignored).
+ *
+ * @return
+ *   0 on success, -1 on error.
  */
 int
 mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete __rte_unused)
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 4e1495800..8748ddcf5 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -242,6 +242,7 @@ mlx5_mp2mr_iter(struct rte_mempool *mp, void *arg)
  *   Pointer to private structure.
  * @param mp
  *   Pointer to the memory pool to register.
+ *
  * @return
  *   The memory region on success.
  */
@@ -302,6 +303,7 @@ priv_mr_new(struct priv *priv, struct rte_mempool *mp)
  *   Pointer to private structure.
  * @param mp
  *   Pointer to the memory pool to register.
+ *
  * @return
  *   The memory region on success.
  */
@@ -352,9 +354,10 @@ priv_mr_release(struct priv *priv __rte_unused, struct 
mlx5_mr *mr)
  * Verify the flow list is empty
  *
  * @param priv
- *  Pointer to private structure.
+ *   Pointer to private structure.
  *
- * @return the number of object not released.
+ * @return
+ *   The number of object not released.
  */
 int
 priv_mr_verify(struct priv *priv)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2fc6e08aa..6924202cc 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -883,9 +883,10 @@ mlx5_priv_rxq_ibv_release(struct priv *priv, struct 
mlx5_rxq_ibv *rxq_ibv)
  * Verify the Verbs Rx queue list is empty
  *
  * @param priv
- *  Pointer to private structure.
+ *   Pointer to private structure.
  *
- * @return the number of object not released.
+ * @return
+ *   The number of object not released.
  */
 int
 mlx5_priv_rxq_ibv_verify(struct priv *priv)
@@ -1139,9 +1140,10 @@ mlx5_priv_rxq_releasable(struct priv *priv, uint16_t idx)
  * Verify the Rx Queue list is empty
  *
  * @param priv
- *  Pointer to private structure.
+ *   Pointer to private structure.
  *
- * @return the number of object not released.
+ * @return
+ *   The number of object not released.
  */
 int
 mlx5_priv_rxq_verify(struct priv *priv)
@@ -1293,9 +1295,10 @@ mlx5_priv_ind_table_ibv_release(struct priv *priv,
  * Verify the Rx Queue list is empty
  *
  * @param priv
- *  Pointer to private structure.
+ *   Pointer to private structure.
  *
- * @return the number of object not released.
+ * @return
+ *   The number of object not released.
  */
 int
 mlx5_priv_ind_table_ibv_verify(struct priv *priv)
@@ -1462,9 +1465,10 @@ mlx5_priv_hrxq_release(struct priv *priv, struct 
mlx5_hrxq *hrxq)
  * Verify the Rx Queue list is empty
  *
  * @param priv
- *  Pointer to private structure.
+ *   Pointer to private structure.
  *
- * @return the number of object not released.
+ * @return
+ *   The number of object not released.
  */
 int
 mlx5_priv_hrxq_ibv_verify(struct priv *priv)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 72e8ff644..b147fb4f8 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -14,6 +14,12 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
+/**
+ * Stop traffic on Tx queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
 static void
 priv_txq_stop(struct priv *priv)
 {
@@ -23,6 +29,15 @@ priv_txq_stop(struct priv *priv)
mlx5_priv_txq_release(priv, i);
 }
 
+

[dpdk-dev] [PATCH v3 07/10] net/mlx5: remove control path locks

2018-03-05 Thread Nelio Laranjeiro
In priv struct only the memory region needs to be protected against
concurrent access between the control plane and the data plane.

Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c |  2 --
 drivers/net/mlx5/mlx5.h | 43 +-
 drivers/net/mlx5/mlx5_ethdev.c  | 58 +++--
 drivers/net/mlx5/mlx5_flow.c| 18 +
 drivers/net/mlx5/mlx5_mr.c  |  4 +--
 drivers/net/mlx5/mlx5_rss.c |  8 --
 drivers/net/mlx5/mlx5_rxq.c |  9 ---
 drivers/net/mlx5/mlx5_stats.c   | 15 +--
 drivers/net/mlx5/mlx5_trigger.c |  7 -
 drivers/net/mlx5/mlx5_txq.c |  5 
 drivers/net/mlx5/mlx5_vlan.c|  6 -
 11 files changed, 9 insertions(+), 166 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 91149ccee..872edab9d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -165,7 +165,6 @@ mlx5_dev_close(struct rte_eth_dev *dev)
unsigned int i;
int ret;
 
-   priv_lock(priv);
DEBUG("%p: closing device \"%s\"",
  (void *)dev,
  ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
@@ -227,7 +226,6 @@ mlx5_dev_close(struct rte_eth_dev *dev)
ret = priv_mr_verify(priv);
if (ret)
WARN("%p: some Memory Region still remain", (void *)priv);
-   priv_unlock(priv);
memset(priv, 0, sizeof(*priv));
 }
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b65962df9..8e021544c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -148,7 +148,7 @@ struct priv {
LIST_HEAD(ind_tables, mlx5_ind_table_ibv) ind_tbls;
uint32_t link_speed_capa; /* Link speed capabilities. */
struct mlx5_xstats_ctrl xstats_ctrl; /* Extended stats control. */
-   rte_spinlock_t lock; /* Lock for control functions. */
+   rte_spinlock_t mr_lock; /* MR Lock. */
int primary_socket; /* Unix socket for primary process. */
void *uar_base; /* Reserved address space for UAR mapping */
struct rte_intr_handle intr_handle_socket; /* Interrupt handler. */
@@ -157,47 +157,6 @@ struct priv {
/* Context for Verbs allocator. */
 };
 
-/**
- * Lock private structure to protect it from concurrent access in the
- * control path.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static inline void
-priv_lock(struct priv *priv)
-{
-   rte_spinlock_lock(&priv->lock);
-}
-
-/**
- * Try to lock private structure to protect it from concurrent access in the
- * control path.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   1 if the lock is successfully taken; 0 otherwise.
- */
-static inline int
-priv_trylock(struct priv *priv)
-{
-   return rte_spinlock_trylock(&priv->lock);
-}
-
-/**
- * Unlock private structure.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static inline void
-priv_unlock(struct priv *priv)
-{
-   rte_spinlock_unlock(&priv->lock);
-}
-
 /* mlx5.c */
 
 int mlx5_getenv_int(const char *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 5c43755d0..f0defc69d 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -269,18 +269,16 @@ priv_set_flags(struct priv *priv, unsigned int keep, 
unsigned int flags)
 }
 
 /**
- * Ethernet device configuration.
- *
- * Prepare the driver for a given number of TX and RX queues.
+ * DPDK callback for Ethernet device configuration.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value on failure.
  */
-static int
-dev_configure(struct rte_eth_dev *dev)
+int
+mlx5_dev_configure(struct rte_eth_dev *dev)
 {
struct priv *priv = dev->data->dev_private;
unsigned int rxqs_n = dev->data->nb_rx_queues;
@@ -362,28 +360,7 @@ dev_configure(struct rte_eth_dev *dev)
j = 0;
}
return 0;
-}
-
-/**
- * DPDK callback for Ethernet device configuration.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- *
- * @return
- *   0 on success, negative errno value on failure.
- */
-int
-mlx5_dev_configure(struct rte_eth_dev *dev)
-{
-   struct priv *priv = dev->data->dev_private;
-   int ret;
 
-   priv_lock(priv);
-   ret = dev_configure(dev);
-   assert(ret >= 0);
-   priv_unlock(priv);
-   return -ret;
 }
 
 /**
@@ -403,7 +380,6 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *info)
char ifname[IF_NAMESIZE];
 
info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
-   priv_lock(priv);
/* FIXME: we should ask the device for these values. */
info->min_rx_bufsize = 32;
info->max_rx_pktlen = 65536;
@@ -431,7 +407,6 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *info)

[dpdk-dev] [PATCH v3 09/10] net/mlx5: change non failing function return values

2018-03-05 Thread Nelio Laranjeiro
These functions return int although they are not supposed to fail,
resulting in unnecessary checks in their callers.
Some are returning error where is should be a boolean.

Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.h |  4 ++--
 drivers/net/mlx5/mlx5_mr.c  |  4 ++--
 drivers/net/mlx5/mlx5_rxq.c | 25 ++---
 drivers/net/mlx5/mlx5_socket.c  |  6 +-
 drivers/net/mlx5/mlx5_trigger.c |  6 +-
 drivers/net/mlx5/mlx5_txq.c | 17 ++---
 6 files changed, 22 insertions(+), 40 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 2cb463b62..86310404a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -242,7 +242,7 @@ int mlx5_vlan_offload_set(struct rte_eth_dev *dev, int 
mask);
 int mlx5_dev_start(struct rte_eth_dev *dev);
 void mlx5_dev_stop(struct rte_eth_dev *dev);
 int mlx5_traffic_enable(struct rte_eth_dev *dev);
-int mlx5_traffic_disable(struct rte_eth_dev *dev);
+void mlx5_traffic_disable(struct rte_eth_dev *dev);
 int mlx5_traffic_restart(struct rte_eth_dev *dev);
 
 /* mlx5_flow.c */
@@ -287,7 +287,7 @@ void mlx5_flow_delete_drop_queue(struct rte_eth_dev *dev);
 /* mlx5_socket.c */
 
 int mlx5_socket_init(struct rte_eth_dev *priv);
-int mlx5_socket_uninit(struct rte_eth_dev *priv);
+void mlx5_socket_uninit(struct rte_eth_dev *priv);
 void mlx5_socket_handle(struct rte_eth_dev *priv);
 int mlx5_socket_connect(struct rte_eth_dev *priv);
 
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index fe60dd132..5c4e68736 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -307,7 +307,7 @@ mlx5_mr_get(struct rte_eth_dev *dev, struct rte_mempool *mp)
  *   Pointer to memory region to release.
  *
  * @return
- *   0 on success, errno on failure.
+ *   1 while a reference on it exists, 0 when freed.
  */
 int
 mlx5_mr_release(struct mlx5_mr *mr)
@@ -321,7 +321,7 @@ mlx5_mr_release(struct mlx5_mr *mr)
rte_free(mr);
return 0;
}
-   return EBUSY;
+   return 1;
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index a3b08a1a3..8e7693df2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -840,7 +840,7 @@ mlx5_rxq_ibv_get(struct rte_eth_dev *dev, uint16_t idx)
  *   Verbs Rx queue object.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   1 while a reference on it exists, 0 when freed.
  */
 int
 mlx5_rxq_ibv_release(struct mlx5_rxq_ibv *rxq_ibv)
@@ -867,7 +867,7 @@ mlx5_rxq_ibv_release(struct mlx5_rxq_ibv *rxq_ibv)
rte_free(rxq_ibv);
return 0;
}
-   return EBUSY;
+   return 1;
 }
 
 /**
@@ -1074,7 +1074,7 @@ mlx5_rxq_get(struct rte_eth_dev *dev, uint16_t idx)
  *   TX queue index.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   1 while a reference on it exists, 0 when freed.
  */
 int
 mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx)
@@ -1086,13 +1086,8 @@ mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx)
return 0;
rxq_ctrl = container_of((*priv->rxqs)[idx], struct mlx5_rxq_ctrl, rxq);
assert(rxq_ctrl->priv);
-   if (rxq_ctrl->ibv) {
-   int ret;
-
-   ret = mlx5_rxq_ibv_release(rxq_ctrl->ibv);
-   if (!ret)
-   rxq_ctrl->ibv = NULL;
-   }
+   if (rxq_ctrl->ibv && !mlx5_rxq_ibv_release(rxq_ctrl->ibv))
+   rxq_ctrl->ibv = NULL;
DEBUG("%p: Rx queue %p: refcnt %d", (void *)dev,
  (void *)rxq_ctrl, rte_atomic32_read(&rxq_ctrl->refcnt));
if (rte_atomic32_dec_and_test(&rxq_ctrl->refcnt)) {
@@ -1101,7 +1096,7 @@ mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx)
(*priv->rxqs)[idx] = NULL;
return 0;
}
-   return EBUSY;
+   return 1;
 }
 
 /**
@@ -1261,7 +1256,7 @@ mlx5_ind_table_ibv_get(struct rte_eth_dev *dev, uint16_t 
queues[],
  *   Indirection table to release.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   1 while a reference on it exists, 0 when freed.
  */
 int
 mlx5_ind_table_ibv_release(struct rte_eth_dev *dev,
@@ -1281,7 +1276,7 @@ mlx5_ind_table_ibv_release(struct rte_eth_dev *dev,
rte_free(ind_tbl);
return 0;
}
-   return EBUSY;
+   return 1;
 }
 
 /**
@@ -1439,7 +1434,7 @@ mlx5_hrxq_get(struct rte_eth_dev *dev, uint8_t *rss_key, 
uint8_t rss_key_len,
  *   Pointer to Hash Rx queue to release.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   1 while a reference on it exists, 0 when freed.
  */
 int
 mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq)
@@ -1454,7 +1449,7 @@ mlx5_hrxq_release(struct rte_eth_dev *dev, struct 
mlx5_hrxq *hrxq)
return 0;
}
claim_nonzero(mlx5_ind_table_ibv_release(dev, hrxq->ind_table));
-

[dpdk-dev] [PATCH v3 10/10] net/mlx5: standardize on negative errno values

2018-03-05 Thread Nelio Laranjeiro
Set rte_errno systematically as well.

Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c |  88 ++-
 drivers/net/mlx5/mlx5_ethdev.c  | 231 -
 drivers/net/mlx5/mlx5_flow.c| 317 +++-
 drivers/net/mlx5/mlx5_mac.c |  33 +++--
 drivers/net/mlx5/mlx5_mr.c  |  15 +-
 drivers/net/mlx5/mlx5_rss.c |  50 ---
 drivers/net/mlx5/mlx5_rxmode.c  |  28 +++-
 drivers/net/mlx5/mlx5_rxq.c | 142 ++
 drivers/net/mlx5/mlx5_socket.c  |  82 +++
 drivers/net/mlx5/mlx5_stats.c   |  53 +--
 drivers/net/mlx5/mlx5_trigger.c |  89 ++-
 drivers/net/mlx5/mlx5_txq.c |  54 ---
 drivers/net/mlx5/mlx5_vlan.c|  24 +--
 13 files changed, 719 insertions(+), 487 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b6211e9c1..10da7a283 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -108,7 +108,7 @@ mlx5_getenv_int(const char *name)
  *   A pointer to the callback data.
  *
  * @return
- *   a pointer to the allocate space.
+ *   Allocated buffer, NULL otherwise and rte_errno is set.
  */
 static void *
 mlx5_alloc_verbs_buf(size_t size, void *data)
@@ -130,6 +130,8 @@ mlx5_alloc_verbs_buf(size_t size, void *data)
}
assert(data != NULL);
ret = rte_malloc_socket(__func__, size, alignment, socket);
+   if (!ret && size)
+   rte_errno = ENOMEM;
DEBUG("Extern alloc size: %lu, align: %lu: %p", size, alignment, ret);
return ret;
 }
@@ -365,7 +367,7 @@ mlx5_dev_idx(struct rte_pci_addr *pci_addr)
  *   User data.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx5_args_check(const char *key, const char *val, void *opaque)
@@ -376,8 +378,9 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
errno = 0;
tmp = strtoul(val, NULL, 0);
if (errno) {
+   rte_errno = errno;
WARN("%s: \"%s\" is not a valid integer", key, val);
-   return errno;
+   return -rte_errno;
}
if (strcmp(MLX5_RXQ_CQE_COMP_EN, key) == 0) {
config->cqe_comp = !!tmp;
@@ -397,7 +400,8 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
config->rx_vec_en = !!tmp;
} else {
WARN("%s: unknown parameter", key);
-   return -EINVAL;
+   rte_errno = EINVAL;
+   return -rte_errno;
}
return 0;
 }
@@ -411,7 +415,7 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
  *   Device arguments structure.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs)
@@ -442,9 +446,10 @@ mlx5_args(struct mlx5_dev_config *config, struct 
rte_devargs *devargs)
if (rte_kvargs_count(kvlist, params[i])) {
ret = rte_kvargs_process(kvlist, params[i],
 mlx5_args_check, config);
-   if (ret != 0) {
+   if (ret) {
+   rte_errno = EINVAL;
rte_kvargs_free(kvlist);
-   return ret;
+   return -rte_errno;
}
}
}
@@ -470,7 +475,7 @@ static void *uar_base;
  *   Pointer to Ethernet device.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx5_uar_init_primary(struct rte_eth_dev *dev)
@@ -479,7 +484,6 @@ mlx5_uar_init_primary(struct rte_eth_dev *dev)
void *addr = (void *)0;
int i;
const struct rte_mem_config *mcfg;
-   int ret;
 
if (uar_base) { /* UAR address space mapped. */
priv->uar_base = uar_base;
@@ -501,8 +505,8 @@ mlx5_uar_init_primary(struct rte_eth_dev *dev)
if (addr == MAP_FAILED) {
ERROR("Failed to reserve UAR address space, please adjust "
  "MLX5_UAR_SIZE or try --base-virtaddr");
-   ret = ENOMEM;
-   return ret;
+   rte_errno = ENOMEM;
+   return -rte_errno;
}
/* Accept either same addr or a new addr returned from mmap if target
 * range occupied.
@@ -521,14 +525,13 @@ mlx5_uar_init_primary(struct rte_eth_dev *dev)
  *   Pointer to Ethernet device.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx5_uar_init_secondary(struct rte_eth_dev *dev)
 {

Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2018-03-05 Thread longtb5
Hi Dave,

Unfortunately I do not have access to our server BIOS settings. The power 
management task for our appliance is also on pending. I'm expecting to return 
to this task in April. Maybe we can still work out a patch before 18.05 (not 
sure about DPDK roadmap).

Regards,
-BL 

> -Original Message-
> From: david.h...@intel.com [mailto:david.h...@intel.com]
> Sent: Monday, March 5, 2018 6:26 PM
> To: long...@viettel.com.vn; dev@dpdk.org
> Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor
> 
> 
> Hi BL,
> 
> 
> On 5/3/2018 10:48 AM, long...@viettel.com.vn
>   wrote:
> 
> 
>   Hi Dave,
> 
>   Actually in my test lab which is a HP box running CentOS 7 on kernel
> version
>   3.10.0-693.5.2.el7.x86_64, the default cpufreq driver is pcc_cpufreq.
> So I guess
>   disabling intel_pstate wouldn't help in my case.
> 
>   # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
>   pcc-cpufreq
> 
>   # cat
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
>   conservative userspace powersave ondemand performance
> 
>   According to kernel doc, pcc_cpufreq also doesn't export
> scaling_availabe_frequencies
>   in sysfs.
> 
>   From kernel doc:
>   "scaling_available_frequencies is not created in /sys. No intermediate
>   frequencies need to be listed because the BIOS will try to achieve any
>   frequency, within limits, requested by the governor. A frequency
> does not have
>   to be strictly associated with a P-state."
> 
>   The lack of scaling_availabe_frequencies makes
> power_acpi_cpufreq_init()
>   complains, similar to the problem with intel_pstate as  in the other
> thread.
>   I have tried (though with not much effort) to force the kernel
>   to use acpi-cpufreq instead but without success.
> 
>   Luckily, as quoted above pcc_cpufreq supports setting of arbitrary
> frequency,
>   so a simple workaround for now is to fake a
> scaling_available_frequencies file
>   in another directory, then edit the code in librte_power to use that
> file instead.
> 
>   Regards,
>   -BL
> 
> 
>   -Original Message-
>   From: david.h...@intel.com 
> [mailto:david.h...@intel.com]
>   Sent: Monday, March 5, 2018 5:16 PM
>   To: long...@viettel.com.vn 
> ; dev@dpdk.org 
>   Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq
> governor
> 
>   Hi BL,
> 
>   I have always used "intel_pstate=disable" in my kernel
> parameters at boot so
>   as to disable the intel_pstate driver, and force the kernel to
> use the acpi-
>   cpufreq driver:
> 
>   # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
>   acpi-cpufreq
> 
>   This then gives me the following options for the governor:
>   ['conservative', 'ondemand', 'userspace', 'powersave',
> 'performance',
>   'schedutil']
> 
>   Because DPDK threads typically poll, they appear as 100%
> busy to the p_state
>   driver, so if you want to be able to change core frequency
> down (as in l3fwd-
>   power), you need to use the acpi-cpufreq driver.
> 
>   I had a read through the docs just now, and this does not
> seem to be
>   mentioned, so I'll do up a patch to give some information on
> the correct
>   kernel parameters to use when using the power library.
> 
>   Regards,
>   Dave.
> 
>   On 2/3/2018 7:20 AM, long...@viettel.com.vn
>   wrote:
> 
>   Forgot to link the original thread.
> 
>   http://dpdk.org/ml/archives/dev/2016-
> January/030930.html
> 
>   -BL
> 
> 
>   -Original Message-
>   From: long...@viettel.com.vn
>   [mailto:long...@viettel.com.vn]
>   Sent: Friday, March 2, 2018 2:19 PM
>   To: dev@dpdk.org 
>   Cc: david.h...@intel.com
>  ; mh...@mhcomputing.net
>  ;
>   helin.zh...@intel.com
>  ; long...@viettel.com.vn
> 
>   Subject: librte_power w/ intel_pstate cpufreq
> governor
> 
>   Hi everybody,
> 
>   I know this thread was from over 2 years ago
> but I ran into the same
> 
>   problem
> 
>   with l3fwd-power today.
> 
>   Any updates o

[dpdk-dev] [PATCH v1] app/pdump: add check for PCAP PMD

2018-03-05 Thread Vipin Varghese
dpdk-pdump makes use of LIBRTE_PMD_PCAP for interfacing the ring to
the device-queue pair. Updating Makefile to check for the same.

Signed-off-by: Vipin Varghese 
---
 app/pdump/Makefile | 4 
 1 file changed, 4 insertions(+)

diff --git a/app/pdump/Makefile b/app/pdump/Makefile
index bd3c208..038a34f 100644
--- a/app/pdump/Makefile
+++ b/app/pdump/Makefile
@@ -3,6 +3,10 @@
 
 include $(RTE_SDK)/mk/rte.vars.mk
 
+ifeq ($(CONFIG_RTE_LIBRTE_PMD_PCAP),n)
+$(error "Please enable CONFIG_RTE_LIBRTE_PMD_PCAP")
+endif
+
 ifeq ($(CONFIG_RTE_LIBRTE_PDUMP),y)
 
 APP = dpdk-pdump
-- 
1.9.1



[dpdk-dev] [PATCH 0/2] net/mlx5: convert to dynamic logs

2018-03-05 Thread Nelio Laranjeiro
This series applies on top of [1]

[1] https://dpdk.org/dev/patchwork/patch/35650/

Nelio Laranjeiro (2):
  net/mlx5: use port id in PMD log
  net/mlx5: use dynamic logging

 drivers/net/mlx5/mlx5.c | 227 +++-
 drivers/net/mlx5/mlx5_ethdev.c  | 112 ++--
 drivers/net/mlx5/mlx5_flow.c|  97 +-
 drivers/net/mlx5/mlx5_mac.c |  11 +-
 drivers/net/mlx5/mlx5_mr.c  |  77 ++-
 drivers/net/mlx5/mlx5_rxmode.c  |  16 +--
 drivers/net/mlx5/mlx5_rxq.c | 285 +++-
 drivers/net/mlx5/mlx5_rxtx.h|  20 +--
 drivers/net/mlx5/mlx5_socket.c  |  58 +---
 drivers/net/mlx5/mlx5_stats.c   |  35 +++--
 drivers/net/mlx5/mlx5_trigger.c |  28 ++--
 drivers/net/mlx5/mlx5_txq.c | 153 -
 drivers/net/mlx5/mlx5_utils.h   |  27 ++--
 drivers/net/mlx5/mlx5_vlan.c|  21 +--
 14 files changed, 697 insertions(+), 470 deletions(-)

-- 
2.11.0



[dpdk-dev] [PATCH 1/2] net/mlx5: use port id in PMD log

2018-03-05 Thread Nelio Laranjeiro
Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c |  79 +---
 drivers/net/mlx5/mlx5_ethdev.c  |  86 +
 drivers/net/mlx5/mlx5_flow.c|  82 +---
 drivers/net/mlx5/mlx5_mac.c |   9 +-
 drivers/net/mlx5/mlx5_mr.c  |  58 +++-
 drivers/net/mlx5/mlx5_rxmode.c  |  16 ++--
 drivers/net/mlx5/mlx5_rxq.c | 201 ++--
 drivers/net/mlx5/mlx5_rxtx.h|   7 +-
 drivers/net/mlx5/mlx5_socket.c  |  47 ++
 drivers/net/mlx5/mlx5_stats.c   |  29 +++---
 drivers/net/mlx5/mlx5_trigger.c |  26 +++---
 drivers/net/mlx5/mlx5_txq.c | 125 ++---
 drivers/net/mlx5/mlx5_vlan.c|  21 +++--
 13 files changed, 446 insertions(+), 340 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 10da7a283..2edd66f2e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -132,7 +132,6 @@ mlx5_alloc_verbs_buf(size_t size, void *data)
ret = rte_malloc_socket(__func__, size, alignment, socket);
if (!ret && size)
rte_errno = ENOMEM;
-   DEBUG("Extern alloc size: %lu, align: %lu: %p", size, alignment, ret);
return ret;
 }
 
@@ -148,7 +147,6 @@ static void
 mlx5_free_verbs_buf(void *ptr, void *data __rte_unused)
 {
assert(data != NULL);
-   DEBUG("Extern free request: %p", ptr);
rte_free(ptr);
 }
 
@@ -167,8 +165,8 @@ mlx5_dev_close(struct rte_eth_dev *dev)
unsigned int i;
int ret;
 
-   DEBUG("%p: closing device \"%s\"",
- (void *)dev,
+   DEBUG("port %u closing device \"%s\"",
+ dev->data->port_id,
  ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
/* In case mlx5_dev_stop() has not been called. */
mlx5_dev_interrupt_handler_uninstall(dev);
@@ -206,28 +204,35 @@ mlx5_dev_close(struct rte_eth_dev *dev)
mlx5_socket_uninit(dev);
ret = mlx5_hrxq_ibv_verify(dev);
if (ret)
-   WARN("%p: some Hash Rx queue still remain", (void *)dev);
+   WARN("port %u some hash Rx queue still remain",
+dev->data->port_id);
ret = mlx5_ind_table_ibv_verify(dev);
if (ret)
-   WARN("%p: some Indirection table still remain", (void *)dev);
+   WARN("port %u some indirection table still remain",
+dev->data->port_id);
ret = mlx5_rxq_ibv_verify(dev);
if (ret)
-   WARN("%p: some Verbs Rx queue still remain", (void *)dev);
+   WARN("port %u some Verbs Rx queue still remain",
+dev->data->port_id);
ret = mlx5_rxq_verify(dev);
if (ret)
-   WARN("%p: some Rx Queues still remain", (void *)dev);
+   WARN("port %u some Rx queues still remain",
+dev->data->port_id);
ret = mlx5_txq_ibv_verify(dev);
if (ret)
-   WARN("%p: some Verbs Tx queue still remain", (void *)dev);
+   WARN("port %u some Verbs Tx queue still remain",
+dev->data->port_id);
ret = mlx5_txq_verify(dev);
if (ret)
-   WARN("%p: some Tx Queues still remain", (void *)dev);
+   WARN("port %u some Tx queues still remain",
+dev->data->port_id);
ret = mlx5_flow_verify(dev);
if (ret)
-   WARN("%p: some flows still remain", (void *)dev);
+   WARN("port %u some flows still remain", dev->data->port_id);
ret = mlx5_mr_verify(dev);
if (ret)
-   WARN("%p: some Memory Region still remain", (void *)dev);
+   WARN("port %u some memory region still remain",
+dev->data->port_id);
memset(priv, 0, sizeof(*priv));
 }
 
@@ -503,15 +508,17 @@ mlx5_uar_init_primary(struct rte_eth_dev *dev)
addr = mmap(addr, MLX5_UAR_SIZE,
PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
-   ERROR("Failed to reserve UAR address space, please adjust "
- "MLX5_UAR_SIZE or try --base-virtaddr");
+   ERROR("port %u failed to reserve UAR address space, please"
+ " adjust MLX5_UAR_SIZE or try --base-virtaddr",
+ dev->data->port_id);
rte_errno = ENOMEM;
return -rte_errno;
}
/* Accept either same addr or a new addr returned from mmap if target
 * range occupied.
 */
-   INFO("Reserved UAR address space: %p", addr);
+   INFO("port %u reserved UAR address space: %p", dev->data->port_id,
+addr);
priv->uar_base = addr; /* for primary and secondary UAR re-mmap. */
uar_base = addr; /* process local, don't reserve again. */
return 0;
@@ -542,20 +549,21 @@ mlx5_uar_init_secondary(struct rte_eth_dev *de

[dpdk-dev] [PATCH 2/2] net/mlx5: use dynamic logging

2018-03-05 Thread Nelio Laranjeiro
Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c | 238 ++--
 drivers/net/mlx5/mlx5_ethdev.c  | 118 +---
 drivers/net/mlx5/mlx5_flow.c| 109 ---
 drivers/net/mlx5/mlx5_mac.c |  12 +-
 drivers/net/mlx5/mlx5_mr.c  |  85 ++--
 drivers/net/mlx5/mlx5_rxmode.c  |  16 +--
 drivers/net/mlx5/mlx5_rxq.c | 298 ++--
 drivers/net/mlx5/mlx5_rxtx.h|  17 +--
 drivers/net/mlx5/mlx5_socket.c  |  65 +
 drivers/net/mlx5/mlx5_stats.c   |  38 ++---
 drivers/net/mlx5/mlx5_trigger.c |  30 ++--
 drivers/net/mlx5/mlx5_txq.c | 164 --
 drivers/net/mlx5/mlx5_utils.h   |  27 ++--
 drivers/net/mlx5/mlx5_vlan.c|  24 ++--
 14 files changed, 681 insertions(+), 560 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 2edd66f2e..4300bafb7 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -77,6 +77,9 @@
 #define MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP (1 << 4)
 #endif
 
+/** Driver-specific log messages type. */
+int mlx5_logtype;
+
 /**
  * Retrieve integer value from environment variable.
  *
@@ -165,9 +168,9 @@ mlx5_dev_close(struct rte_eth_dev *dev)
unsigned int i;
int ret;
 
-   DEBUG("port %u closing device \"%s\"",
- dev->data->port_id,
- ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+   DRV_LOG(DEBUG, "port %u closing device \"%s\"",
+   dev->data->port_id,
+   ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
/* In case mlx5_dev_stop() has not been called. */
mlx5_dev_interrupt_handler_uninstall(dev);
mlx5_traffic_disable(dev);
@@ -204,35 +207,36 @@ mlx5_dev_close(struct rte_eth_dev *dev)
mlx5_socket_uninit(dev);
ret = mlx5_hrxq_ibv_verify(dev);
if (ret)
-   WARN("port %u some hash Rx queue still remain",
-dev->data->port_id);
+   DRV_LOG(WARNING, "port %u some hash Rx queue still remain",
+   dev->data->port_id);
ret = mlx5_ind_table_ibv_verify(dev);
if (ret)
-   WARN("port %u some indirection table still remain",
-dev->data->port_id);
+   DRV_LOG(WARNING, "port %u some indirection table still remain",
+   dev->data->port_id);
ret = mlx5_rxq_ibv_verify(dev);
if (ret)
-   WARN("port %u some Verbs Rx queue still remain",
-dev->data->port_id);
+   DRV_LOG(WARNING, "port %u some Verbs Rx queue still remain",
+   dev->data->port_id);
ret = mlx5_rxq_verify(dev);
if (ret)
-   WARN("port %u some Rx queues still remain",
-dev->data->port_id);
+   DRV_LOG(WARNING, "port %u some Rx queues still remain",
+   dev->data->port_id);
ret = mlx5_txq_ibv_verify(dev);
if (ret)
-   WARN("port %u some Verbs Tx queue still remain",
-dev->data->port_id);
+   DRV_LOG(WARNING, "port %u some Verbs Tx queue still remain",
+   dev->data->port_id);
ret = mlx5_txq_verify(dev);
if (ret)
-   WARN("port %u some Tx queues still remain",
-dev->data->port_id);
+   DRV_LOG(WARNING, "port %u some Tx queues still remain",
+   dev->data->port_id);
ret = mlx5_flow_verify(dev);
if (ret)
-   WARN("port %u some flows still remain", dev->data->port_id);
+   DRV_LOG(WARNING, "port %u some flows still remain",
+   dev->data->port_id);
ret = mlx5_mr_verify(dev);
if (ret)
-   WARN("port %u some memory region still remain",
-dev->data->port_id);
+   DRV_LOG(WARNING, "port %u some memory region still remain",
+   dev->data->port_id);
memset(priv, 0, sizeof(*priv));
 }
 
@@ -384,7 +388,7 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
tmp = strtoul(val, NULL, 0);
if (errno) {
rte_errno = errno;
-   WARN("%s: \"%s\" is not a valid integer", key, val);
+   DRV_LOG(WARNING, "%s: \"%s\" is not a valid integer", key, val);
return -rte_errno;
}
if (strcmp(MLX5_RXQ_CQE_COMP_EN, key) == 0) {
@@ -404,7 +408,7 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
} else if (strcmp(MLX5_RX_VEC_EN, key) == 0) {
config->rx_vec_en = !!tmp;
} else {
-   WARN("%s: unknown parameter", key);
+   DRV_LOG(WARNING, "%s: unknown parameter", key);
rte_errno = EINVAL;
return -rte_errno;
}
@@ -508,17 +512,18 @@ mlx5_

Re: [dpdk-dev] [PATCH] net/null:Different mac address support

2018-03-05 Thread Ferruh Yigit
On 2/3/2018 2:11 AM, Mallesh Koujalagi wrote:
> After attaching two Null device to ovs, seeing "00.00.00.00.00.00" mac
> address for both null devices. Fix this issue, by setting different mac
> address.
> 
> Signed-off-by: Mallesh Koujalagi 
> ---
>  drivers/net/null/rte_eth_null.c | 23 +--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
> index 9385ffd..98ac115 100644
> --- a/drivers/net/null/rte_eth_null.c
> +++ b/drivers/net/null/rte_eth_null.c
> @@ -85,8 +85,17 @@ struct pmd_internals {
>   uint8_t rss_key[40];/**< 40-byte hash key. */
>  };
>  
> +static struct ether_addr base_eth_addr = {
> + .addr_bytes = {
> + 0x4E /* N */,
> + 0x55 /* U */,
> + 0x4C /* L */,
> + 0x4C /* L */,
> + 0x00,
> + 0x00
> + }
> +};
>  
> -static struct ether_addr eth_addr = { .addr_bytes = {0} };
>  static struct rte_eth_link pmd_link = {
>   .link_speed = ETH_SPEED_NUM_10G,
>   .link_duplex = ETH_LINK_FULL_DUPLEX,
> @@ -492,6 +501,7 @@ eth_dev_null_create(struct rte_vdev_device *dev,
>   struct rte_eth_dev_data *data = NULL;
>   struct pmd_internals *internals = NULL;
>   struct rte_eth_dev *eth_dev = NULL;
> + struct ether_addr *eth_addr = NULL;
>  
>   static const uint8_t default_rss_key[40] = {
>   0x6D, 0x5A, 0x56, 0xDA, 0x25, 0x5B, 0x0E, 0xC2, 0x41, 0x67, 
> 0x25, 0x3D,
> @@ -519,6 +529,15 @@ eth_dev_null_create(struct rte_vdev_device *dev,
>   rte_free(data);
>   return -ENOMEM;
>   }
> + eth_addr = rte_zmalloc_socket(rte_vdev_device_name(dev),
> + sizeof(*eth_addr), 0, dev->device.numa_node);

Why not put "struct ether_addr" into "struct pmd_internals" as ring pmd does?
This saves from extra memory allocation and error recovery complexity.

> + if (eth_addr == NULL) {
> + rte_eth_dev_release_port(eth_dev);
> + rte_free(data);

Need to free data->dev_private which has been allocated by 
rte_eth_vdev_allocate()

And note rte_eth_vdev_allocate() should be done after that since it memset the 
data.

Also needs to free eth_addr in rte_pmd_null_remove()

> + return -ENOMEM;
> + }
> + *eth_addr = base_eth_addr;
> + eth_addr->addr_bytes[5] = eth_dev->data->port_id;
>  
>   /* now put it all together
>* - store queue data in internals,
> @@ -543,7 +562,7 @@ eth_dev_null_create(struct rte_vdev_device *dev,
>   data->nb_rx_queues = (uint16_t)nb_rx_queues;
>   data->nb_tx_queues = (uint16_t)nb_tx_queues;
>   data->dev_link = pmd_link;
> - data->mac_addrs = ð_addr;
> + data->mac_addrs = eth_addr;
>  
>   eth_dev->data = data;
>   eth_dev->dev_ops = &ops;
> 



Re: [dpdk-dev] [PATCH] compressdev: implement API

2018-03-05 Thread Verma, Shally


>-Original Message-
>From: Ahmed Mansour [mailto:ahmed.mans...@nxp.com]
>Sent: 03 March 2018 01:19
>To: Trahe, Fiona ; Verma, Shally 
>; dev@dpdk.org
>Cc: De Lara Guarch, Pablo ; Athreya, Narayana 
>Prasad ;
>Gupta, Ashish ; Sahu, Sunila 
>; Challa, Mahipal
>; Jain, Deepak K ; Hemant 
>Agrawal ; Roy
>Pledge ; Youri Querry 
>Subject: Re: [dpdk-dev] [PATCH] compressdev: implement API
>
>On 3/2/2018 4:53 AM, Trahe, Fiona wrote:
>>
>>> On 3/1/2018 9:41 AM, Trahe, Fiona wrote:
 Hi Shally

 //snip//
> [Shally] This looks better to me. So it mean app would always call 
> xform_init() for stateless and attach
>>> an
> updated priv_xform to ops (depending upon if there's shareable or not). 
> So it does not need to have
> NULL pointer on priv_xform. right?
>
 [Fiona] yes. The PMD must return a valid priv_xform pointer.
>>> [Ahmed] What I understood is that the xform_init will be called once
>>> initially. if the @flag returned is NONE_SHAREABLE then the application
>>> must not attach two inflight ops to the same @priv_xform? Otherwise the
>>> application can attach many ops in flight to the @priv_xform?
>> [Fiona Yes. App calls the xform_init() once on a device where it plans to 
>> send stateless ops.
>> If PMD returns shareable, then it doesn't need to call again and can attach 
>> this to every stateless op going to that device.
>> If PMD returns SINGLE_OP then it must call xform_init() before every other
>> stateless op it wants to have inflight simultaneously. This does not mean it 
>> must be called before every op,
>> but probably will set up a batch of priv_xforms  - it can reuse each 
>> priv_xform once the op finishes with it.
>[Ahmed] @Shally Can this complexity of managing the NONE_SHAREABLE mode
>be pushed into the PMD? A flexible stockpile can be kept and maintained
>by the PMD and it can be increased or decreased based on
>low-water/high-water thresholds
[Shally] It is doable to manage within PMD but need to do hands on to evaluate 
effectiveness. So far, we have never exercised this way and left it to 
application to attach different session (or stream) to op for maximum 
performance gain. So, I would say, may it be ok to have flag feature in first 
place and deprecate later, if it not required?! Or just have API without any 
flag option and add a feature flag to indicate PMD support for 
SHAREABLE/NON-SHAREABLE xform_priv handle?!



Re: [dpdk-dev] [PATCH v1] app/pdump: add check for PCAP PMD

2018-03-05 Thread Ferruh Yigit
On 3/5/2018 7:57 AM, Vipin Varghese wrote:
> dpdk-pdump makes use of LIBRTE_PMD_PCAP for interfacing the ring to
> the device-queue pair. Updating Makefile to check for the same.
> 
> Signed-off-by: Vipin Varghese 
> ---
>  app/pdump/Makefile | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/app/pdump/Makefile b/app/pdump/Makefile
> index bd3c208..038a34f 100644
> --- a/app/pdump/Makefile
> +++ b/app/pdump/Makefile
> @@ -3,6 +3,10 @@
>  
>  include $(RTE_SDK)/mk/rte.vars.mk
>  
> +ifeq ($(CONFIG_RTE_LIBRTE_PMD_PCAP),n)
> +$(error "Please enable CONFIG_RTE_LIBRTE_PMD_PCAP")
> +endif

pdump is enabled default, so won't this break the default build?

What about moving this to lib/librte_pdump, convert $(error ..) to $(warning ..)
and disable CONFIG_RTE_LIBRTE_PDUMP there?

> +
>  ifeq ($(CONFIG_RTE_LIBRTE_PDUMP),y)
>  
>  APP = dpdk-pdump
> 



[dpdk-dev] [PATCH v3 4/7] app/testpmd: introduce VXLAN GPE to csum forwarding engine

2018-03-05 Thread Xueming Li
This patch introduced VXLAN-GPE support to csum forwarding engine by
recognizing VXLAN-GPE UDP port and parsing tunnel payload according to
next-protocol type.

Signed-off-by: Xueming Li 
---
 app/test-pmd/csumonly.c   | 96 +--
 app/test-pmd/parameters.c | 12 -
 app/test-pmd/testpmd.h|  2 +
 doc/guides/testpmd_app_ug/run_app.rst |  5 ++
 4 files changed, 111 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 00ec40d58..526d28e74 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -56,6 +56,10 @@
 #define GRE_SUPPORTED_FIELDS   (GRE_CHECKSUM_PRESENT | GRE_KEY_PRESENT |\
 GRE_SEQUENCE_PRESENT)
 
+#define VXLAN_GPE_TYPE_IPv4 1
+#define VXLAN_GPE_TYPE_IPv6 2
+#define VXLAN_GPE_TYPE_ETH 3
+
 /* We cannot use rte_cpu_to_be_16() on a constant in a switch/case */
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 #define _htons(x) ((uint16_t)x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 8)))
@@ -63,6 +67,8 @@
 #define _htons(x) (x)
 #endif
 
+uint16_t vxlan_gpe_udp_port = 4790;
+
 /* structure that caches offload info for the current packet */
 struct testpmd_offload_info {
uint16_t ethertype;
@@ -87,6 +93,14 @@ struct simple_gre_hdr {
uint16_t proto;
 } __attribute__((__packed__));
 
+/* simplified VXLAN-GPE header */
+struct vxlan_gpe_hdr {
+   uint8_t vx_flags; /**< flag (8). */
+   uint8_t reserved[2]; /**< Reserved (16). */
+   uint8_t proto; /**< next-protocol (8). */
+   uint32_t vx_vni;   /**< VNI (24) + Reserved (8). */
+} __attribute__((__packed__));
+
 static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
@@ -197,6 +211,70 @@ parse_vxlan(struct udp_hdr *udp_hdr,
info->l2_len += ETHER_VXLAN_HLEN; /* add udp + vxlan */
 }
 
+/* Parse a vxlan-gpe header */
+static void
+parse_vxlan_gpe(struct udp_hdr *udp_hdr,
+   struct testpmd_offload_info *info)
+{
+   struct ether_hdr *eth_hdr;
+   struct ipv4_hdr *ipv4_hdr;
+   struct ipv6_hdr *ipv6_hdr;
+   struct vxlan_gpe_hdr *vxlan_gpe_hdr;
+   uint8_t vxlan_gpe_len = sizeof(*vxlan_gpe_hdr);
+
+   /* check udp destination port, 4790 is the default vxlan-gpe port */
+   if (udp_hdr->dst_port != _htons(vxlan_gpe_udp_port))
+   return;
+
+   vxlan_gpe_hdr = (struct vxlan_gpe_hdr *)((char *)udp_hdr +
+   sizeof(struct udp_hdr));
+
+   if (!vxlan_gpe_hdr->proto || vxlan_gpe_hdr->proto ==
+   VXLAN_GPE_TYPE_IPv4) {
+   info->is_tunnel = 1;
+   info->outer_ethertype = info->ethertype;
+   info->outer_l2_len = info->l2_len;
+   info->outer_l3_len = info->l3_len;
+   info->outer_l4_proto = info->l4_proto;
+
+   ipv4_hdr = (struct ipv4_hdr *)((char *)vxlan_gpe_hdr +
+  vxlan_gpe_len);
+
+   parse_ipv4(ipv4_hdr, info);
+   info->ethertype = _htons(ETHER_TYPE_IPv4);
+   info->l2_len = 0;
+
+   } else if (vxlan_gpe_hdr->proto == VXLAN_GPE_TYPE_IPv6) {
+   info->is_tunnel = 1;
+   info->outer_ethertype = info->ethertype;
+   info->outer_l2_len = info->l2_len;
+   info->outer_l3_len = info->l3_len;
+   info->outer_l4_proto = info->l4_proto;
+
+   ipv6_hdr = (struct ipv6_hdr *)((char *)vxlan_gpe_hdr +
+  vxlan_gpe_len);
+
+   info->ethertype = _htons(ETHER_TYPE_IPv6);
+   parse_ipv6(ipv6_hdr, info);
+   info->l2_len = 0;
+
+   } else if (vxlan_gpe_hdr->proto == VXLAN_GPE_TYPE_ETH) {
+   info->is_tunnel = 1;
+   info->outer_ethertype = info->ethertype;
+   info->outer_l2_len = info->l2_len;
+   info->outer_l3_len = info->l3_len;
+   info->outer_l4_proto = info->l4_proto;
+
+   eth_hdr = (struct ether_hdr *)((char *)vxlan_gpe_hdr +
+ vxlan_gpe_len);
+
+   parse_ethernet(eth_hdr, info);
+   } else
+   return;
+
+   info->l2_len += ETHER_VXLAN_HLEN;
+}
+
 /* Parse a gre header */
 static void
 parse_gre(struct simple_gre_hdr *gre_hdr, struct testpmd_offload_info *info)
@@ -591,6 +669,10 @@ pkt_copy_split(const struct rte_mbuf *pkt)
  *   Ether / (vlan) / IP|IP6 / UDP|TCP|SCTP .
  *   Ether / (vlan) / outer IP|IP6 / outer UDP / VxLAN / Ether / IP|IP6 /
  *   UDP|TCP|SCTP
+ *   Ether / (vlan) / outer IP|IP6 / outer UDP / VXLAN-GPE / Ether / IP|IP6 /
+ *   UDP|TCP|SCTP
+ *   Ether / (vlan) / outer IP|IP6 / outer UDP / VXLAN-GPE / IP|IP6 /
+ *   UDP|TCP|SCTP
  *   Ether / (vlan) / outer IP|IP6 / GRE / Ether / IP|IP6 / UDP|TCP|SCTP
  *   Ether / (vlan) / outer IP|IP6 / GRE / IP|IP6 / UDP|TCP|SCTP
  *   Ether / (vlan) / outer IP|IP6 / IP|IP6 / UDP|

Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/7] ethdev: fix port data reset timing

2018-03-05 Thread Matan Azrad
HI

From: Ferruh Yigit, Sent: Monday, March 5, 2018 1:24 PM
> On 1/18/2018 4:35 PM, Matan Azrad wrote:
> > rte_eth_dev_data structure is allocated per ethdev port and can be
> > used to get a data of the port internally.
> >
> > rte_eth_dev_attach_secondary tries to find the port identifier using
> > rte_eth_dev_data name field comparison and may get an identifier of
> > invalid port in case of this port was released by the primary process
> > because the port release API doesn't reset the port data.
> >
> > So, it will be better to reset the port data in release time instead
> > of allocation time.
> >
> > Move the port data reset to the port release API.
> >
> > Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple
> > process model")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Matan Azrad 
> > ---
> >  lib/librte_ether/rte_ethdev.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_ether/rte_ethdev.c
> > b/lib/librte_ether/rte_ethdev.c index 7044159..156231c 100644
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -204,7 +204,6 @@ struct rte_eth_dev *
> > return NULL;
> > }
> >
> > -   memset(&rte_eth_dev_data[port_id], 0, sizeof(struct
> rte_eth_dev_data));
> > eth_dev = eth_dev_get(port_id);
> > snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
> "%s", name);
> > eth_dev->data->port_id = port_id;
> > @@ -252,6 +251,7 @@ struct rte_eth_dev *
> > if (eth_dev == NULL)
> > return -EINVAL;
> >
> > +   memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> 
> Hi Matan,
> 
> What most of the vdev release path does is:
> 
> eth_dev = rte_eth_dev_allocated(...)
> rte_free(eth_dev->data->dev_private);
> rte_free(eth_dev->data);
> rte_eth_dev_release_port(eth_dev);
> 
> Since eth_dev->data freed, memset() it in rte_eth_dev_release_port() will
> be problem.
> 
> We don't run remove path that is why we didn't hit the issue but this seems
> problem for all virtual PMDs.

Yes, it is a problem and should be fixed:
For vdevs which use private rte_eth_dev_data the remove order can be:
private_data = eth_dev->data;
rte_free(eth_dev->data->dev_private);
rte_eth_dev_release_port(eth_dev); /* The last operation working on 
ethdev structure. */
rte_free(private_data);


> Also rte_eth_dev_pci_release() looks problematic now.

Yes, again, the last operation working on ethdev structure should be 
rte_eth_dev_release_port().

So need to fix all vdevs and the rte_eth_dev_pci_release() function.

Any comments?


[dpdk-dev] [PATCH v3 1/7] ethdev: introduce Tx generic tunnel L3/L4 offload

2018-03-05 Thread Xueming Li
This patch introduce new TX offload flags for device that supports
tunnel agnostic L3/L4 checksum and TSO offload.

The support from the device is for inner and outer checksums on
IPV4/TCP/UDP and TSO for *any packet with the following format*:

< some headers > / [optional IPv4/IPv6] / [optional TCP/UDP] /  / [optional inner IPv4/IPv6] / [optional TCP/UDP]

For example the following packets can use this feature:

1. eth / ipv4 / udp / VXLAN / ip / tcp
2. eth / ipv4 / GRE / MPLS / ipv4 / udp

Signed-off-by: Xueming Li 
---
 lib/librte_ether/rte_ethdev.h | 24 
 lib/librte_mbuf/rte_mbuf.c|  5 +
 lib/librte_mbuf/rte_mbuf.h| 18 --
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 036153306..66d12d3e0 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -980,6 +980,30 @@ struct rte_eth_conf {
  *   the same mempool and has refcnt = 1.
  */
 #define DEV_TX_OFFLOAD_SECURITY 0x0002
+/**< Generic tunnel L3/L4 checksum offload. To enable this offload feature
+ * for a packet to be transmitted on hardware supporting generic tunnel L3/L4
+ * checksum offload:
+ *  - fill outer_l2_len and outer_l3_len in mbuf
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN if undefined)
+ *  - set the flags PKT_TX_OUTER_IP_CKSUM
+ *  - set the flags PKT_TX_IP_CKSUM
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ */
+#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM   0x0004
+/**< Generic tunnel segmentation offload. To enable it, the user needs to:
+ *  - fill outer_l2_len and outer_l3_len in mbuf
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN if undefined)
+ *  - set the flags PKT_TX_OUTER_IPV4 or PKT_TX_OUTER_IPV6
+ *  - if it's UDP tunnel, set the flags PKT_TX_OUTER_UDP
+ *  - set the flags PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
+ *PKT_TX_OUTER_IP_CKSUM, PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM)
+ * Hardware that supports generic tunnel TSO offload only update outer/inner
+ * L3/L4 fields, tunnel fields are not touched.
+ */
+#define DEV_TX_OFFLOAD_GENERIC_TNL_TSO 0x0008
 
 /*
  * If new Tx offload capabilities are defined, they also must be
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 091d388d3..c139d5b30 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -400,11 +400,13 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
case PKT_TX_OUTER_IP_CKSUM: return "PKT_TX_OUTER_IP_CKSUM";
case PKT_TX_OUTER_IPV4: return "PKT_TX_OUTER_IPV4";
case PKT_TX_OUTER_IPV6: return "PKT_TX_OUTER_IPV6";
+   case PKT_TX_OUTER_UDP: return "PKT_TX_OUTER_UDP";
case PKT_TX_TUNNEL_VXLAN: return "PKT_TX_TUNNEL_VXLAN";
case PKT_TX_TUNNEL_GRE: return "PKT_TX_TUNNEL_GRE";
case PKT_TX_TUNNEL_IPIP: return "PKT_TX_TUNNEL_IPIP";
case PKT_TX_TUNNEL_GENEVE: return "PKT_TX_TUNNEL_GENEVE";
case PKT_TX_TUNNEL_MPLSINUDP: return "PKT_TX_TUNNEL_MPLSINUDP";
+   case PKT_TX_TUNNEL_UNKNOWN: return "PKT_TX_TUNNEL_UNKNOWN";
case PKT_TX_MACSEC: return "PKT_TX_MACSEC";
case PKT_TX_SEC_OFFLOAD: return "PKT_TX_SEC_OFFLOAD";
default: return NULL;
@@ -429,6 +431,7 @@ rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t 
buflen)
{ PKT_TX_OUTER_IP_CKSUM, PKT_TX_OUTER_IP_CKSUM, NULL },
{ PKT_TX_OUTER_IPV4, PKT_TX_OUTER_IPV4, NULL },
{ PKT_TX_OUTER_IPV6, PKT_TX_OUTER_IPV6, NULL },
+   { PKT_TX_OUTER_UDP, PKT_TX_OUTER_UDP, NULL },
{ PKT_TX_TUNNEL_VXLAN, PKT_TX_TUNNEL_MASK,
  "PKT_TX_TUNNEL_NONE" },
{ PKT_TX_TUNNEL_GRE, PKT_TX_TUNNEL_MASK,
@@ -439,6 +442,8 @@ rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t 
buflen)
  "PKT_TX_TUNNEL_NONE" },
{ PKT_TX_TUNNEL_MPLSINUDP, PKT_TX_TUNNEL_MASK,
  "PKT_TX_TUNNEL_NONE" },
+   { PKT_TX_TUNNEL_UNKNOWN, PKT_TX_TUNNEL_MASK,
+ "PKT_TX_TUNNEL_NONE" },
{ PKT_TX_MACSEC, PKT_TX_MACSEC, NULL },
{ PKT_TX_SEC_OFFLOAD, PKT_TX_SEC_OFFLOAD, NULL },
};
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 62740254d..53cc1b713 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -210,6 +210,13 @@ extern "C" {
 #define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
 /**< TX packet with MPLS-in-UDP RFC 7510 header. */
 #define PKT_TX_TUNNEL_MPLSINUDP (0x5ULL << 45)
+/**
+ * Used by generic tunnel checksum and TSO. Please refer to document of below
+ * fields to enable this feature on hardware support Generic tunnel offload:
+ *  - DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM
+ *  -

[dpdk-dev] [PATCH v3 3/7] app/testpmd: add more GRE extension to csum engine

2018-03-05 Thread Xueming Li
This patch adds GRE checksum and sequence extension supports in addtion
to key extension to csum forwarding engine.

Signed-off-by: Xueming Li 
---
 app/test-pmd/csumonly.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 7b2309372..00ec40d58 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -49,9 +49,12 @@
 #define IP_HDRLEN  0x05 /* default IP header length == five 32-bits words. */
 #define IP_VHL_DEF (IP_VERSION | IP_HDRLEN)
 
-#define GRE_KEY_PRESENT 0x2000
-#define GRE_KEY_LEN 4
-#define GRE_SUPPORTED_FIELDS GRE_KEY_PRESENT
+#define GRE_CHECKSUM_PRESENT   0x8000
+#define GRE_KEY_PRESENT0x2000
+#define GRE_SEQUENCE_PRESENT   0x1000
+#define GRE_EXT_LEN4
+#define GRE_SUPPORTED_FIELDS   (GRE_CHECKSUM_PRESENT | GRE_KEY_PRESENT |\
+GRE_SEQUENCE_PRESENT)
 
 /* We cannot use rte_cpu_to_be_16() on a constant in a switch/case */
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
@@ -203,14 +206,14 @@ parse_gre(struct simple_gre_hdr *gre_hdr, struct 
testpmd_offload_info *info)
struct ipv6_hdr *ipv6_hdr;
uint8_t gre_len = 0;
 
-   /* check which fields are supported */
-   if ((gre_hdr->flags & _htons(~GRE_SUPPORTED_FIELDS)) != 0)
-   return;
-
gre_len += sizeof(struct simple_gre_hdr);
 
if (gre_hdr->flags & _htons(GRE_KEY_PRESENT))
-   gre_len += GRE_KEY_LEN;
+   gre_len += GRE_EXT_LEN;
+   if (gre_hdr->flags & _htons(GRE_SEQUENCE_PRESENT))
+   gre_len += GRE_EXT_LEN;
+   if (gre_hdr->flags & _htons(GRE_CHECKSUM_PRESENT))
+   gre_len += GRE_EXT_LEN;
 
if (gre_hdr->proto == _htons(ETHER_TYPE_IPv4)) {
info->is_tunnel = 1;
@@ -739,6 +742,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 
/* step 3: fill the mbuf meta data (flags and header lengths) */
 
+   m->tx_offload = 0;
if (info.is_tunnel == 1) {
if (info.tunnel_tso_segsz ||
(tx_offloads &
-- 
2.13.3



[dpdk-dev] [PATCH v3 5/7] net/mlx5: separate TSO function in Tx data path

2018-03-05 Thread Xueming Li
Separate TSO function to make logic of mlx5_tx_burst clear.

Signed-off-by: Xueming Li 
---
 drivers/net/mlx5/mlx5_rxtx.c | 112 ++-
 1 file changed, 67 insertions(+), 45 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 049f7e6c1..6d273841b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -219,6 +219,66 @@ mlx5_copy_to_wq(void *dst, const void *src, size_t n,
 }
 
 /**
+ * Inline TSO headers into WQE.
+ *
+ * @return
+ *   0 on success, negative errno value on failure.
+ */
+static int
+inline_tso(struct mlx5_txq_data *txq, struct rte_mbuf *buf,
+  uint32_t *length,
+  uint8_t *cs_flags,
+  uintptr_t *addr,
+  uint16_t *pkt_inline_sz,
+  uint8_t **raw,
+  uint16_t *max_wqe,
+  uint16_t *tso_segsz,
+  uint16_t *tso_header_sz)
+{
+   uintptr_t end = (uintptr_t)(((uintptr_t)txq->wqes) +
+   (1 << txq->wqe_n) * MLX5_WQE_SIZE);
+   unsigned int copy_b;
+   uint8_t vlan_sz = (buf->ol_flags & PKT_TX_VLAN_PKT) ? 4 : 0;
+   const uint8_t tunneled = txq->tunnel_en &&
+(buf->ol_flags & (PKT_TX_TUNNEL_GRE |
+  PKT_TX_TUNNEL_VXLAN));
+   uint16_t n_wqe;
+
+   *tso_segsz = buf->tso_segsz;
+   *tso_header_sz = buf->l2_len + vlan_sz + buf->l3_len + buf->l4_len;
+   if (unlikely(*tso_segsz == 0 || *tso_header_sz == 0)) {
+   txq->stats.oerrors++;
+   return -EINVAL;
+   }
+   if (tunneled) {
+   *tso_header_sz += buf->outer_l2_len + buf->outer_l3_len;
+   *cs_flags |= MLX5_ETH_WQE_L4_INNER_CSUM;
+   } else {
+   *cs_flags |= MLX5_ETH_WQE_L4_CSUM;
+   }
+   if (unlikely(*tso_header_sz > MLX5_MAX_TSO_HEADER)) {
+   txq->stats.oerrors++;
+   return -EINVAL;
+   }
+   copy_b = *tso_header_sz - *pkt_inline_sz;
+   /* First seg must contain all TSO headers. */
+   assert(copy_b <= *length);
+   if (!copy_b || ((end - (uintptr_t)*raw) < copy_b))
+   return -EAGAIN;
+   n_wqe = (MLX5_WQE_DS(copy_b) - 1 + 3) / 4;
+   if (unlikely(*max_wqe < n_wqe))
+   return -EINVAL;
+   *max_wqe -= n_wqe;
+   rte_memcpy((void *)*raw, (void *)*addr, copy_b);
+   *length -= copy_b;
+   *addr += copy_b;
+   copy_b = MLX5_WQE_DS(copy_b) * MLX5_WQE_DWORD_SIZE;
+   *pkt_inline_sz += copy_b;
+   *raw += copy_b;
+   return 0;
+}
+
+/**
  * DPDK callback to check the status of a tx descriptor.
  *
  * @param tx_queue
@@ -352,6 +412,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
 #ifdef MLX5_PMD_SOFT_COUNTERS
uint32_t total_length = 0;
 #endif
+   int ret;
 
/* first_seg */
buf = *pkts;
@@ -417,52 +478,13 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
raw += MLX5_WQE_DWORD_SIZE;
tso = txq->tso_en && (buf->ol_flags & PKT_TX_TCP_SEG);
if (tso) {
-   uintptr_t end =
-   (uintptr_t)(((uintptr_t)txq->wqes) +
-   (1 << txq->wqe_n) * MLX5_WQE_SIZE);
-   unsigned int copy_b;
-   uint8_t vlan_sz =
-   (buf->ol_flags & PKT_TX_VLAN_PKT) ? 4 : 0;
-   const uint64_t is_tunneled =
-   buf->ol_flags & (PKT_TX_TUNNEL_GRE |
-PKT_TX_TUNNEL_VXLAN);
-
-   tso_header_sz = buf->l2_len + vlan_sz +
-   buf->l3_len + buf->l4_len;
-   tso_segsz = buf->tso_segsz;
-   if (unlikely(tso_segsz == 0)) {
-   txq->stats.oerrors++;
+   ret = inline_tso(txq, buf, &length, &cs_flags,
+&addr, &pkt_inline_sz,
+&raw, &max_wqe,
+&tso_segsz, &tso_header_sz);
+   if (ret == -EINVAL) {
break;
-   }
-   if (is_tunneled && txq->tunnel_en) {
-   tso_header_sz += buf->outer_l2_len +
-buf->outer_l3_len;
-   cs_flags |= MLX5_ETH_WQE_L4_INNER_CSUM;
-   } else {
-   cs_flags |= MLX5_ETH_WQE_L4_CSUM;
-   }
-   if (unlikely(tso_header_sz > MLX5_MAX_TSO_HEADER)) {
-   txq->stats.oerrors++;
-   break;
-  

[dpdk-dev] [PATCH v3 0/7] support generic tunnel Tx checksum and TSO

2018-03-05 Thread Xueming Li
- Add VXLAN-GPE and GRE extention support to testpmd csum forwarding enginee
- Split DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM_TSO into 
DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM
  and DEV_TX_OFFLOAD_GENERIC_TNL_TSO
- Add PKT_TX_TUNNEL_UNKNOWN and PKT_TX_OUTER_UDP

  http://www.dpdk.org/dev/patchwork/patch/34655/


This patchset add new HW TX capability of generic tunnel checksum and TSO
offloads, HW supporting generic tunnel offloading could handle new tunnel 
type offloading w/o upgrading HW. 

This is achieved by informing HW offsets and types of headers, HW would 
do calculation and TSO segments based on packet inner and outer headers offset
regardless of tunnel type.

Xueming Li (7):
  ethdev: introduce Tx generic tunnel L3/L4 offload
  app/testpmd: testpmd support Tx generic tunnel offloads
  app/testpmd: add more GRE extension to csum engine
  app/testpmd: introduce VXLAN GPE to csum forwarding engine
  net/mlx5: separate TSO function in Tx data path
  net/mlx5: support generic tunnel offloading
  net/mlx5: allow max 192B TSO inline header length

 app/test-pmd/cmdline.c|   9 +-
 app/test-pmd/config.c |  18 +++
 app/test-pmd/csumonly.c   | 117 +--
 app/test-pmd/parameters.c |  12 +-
 app/test-pmd/testpmd.h|   2 +
 doc/guides/nics/mlx5.rst  |   8 ++
 doc/guides/testpmd_app_ug/run_app.rst |   5 +
 drivers/net/mlx5/Makefile |   5 +
 drivers/net/mlx5/mlx5.c   |  28 -
 drivers/net/mlx5/mlx5.h   |   1 +
 drivers/net/mlx5/mlx5_defs.h  |   2 +-
 drivers/net/mlx5/mlx5_ethdev.c|   4 +-
 drivers/net/mlx5/mlx5_prm.h   |  24 
 drivers/net/mlx5/mlx5_rxtx.c  | 208 --
 drivers/net/mlx5/mlx5_rxtx.h  |  98 
 drivers/net/mlx5/mlx5_rxtx_vec.c  |   9 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h |   2 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h  |   2 +-
 drivers/net/mlx5/mlx5_txq.c   |  12 +-
 lib/librte_ether/rte_ethdev.h |  24 
 lib/librte_mbuf/rte_mbuf.c|   5 +
 lib/librte_mbuf/rte_mbuf.h|  18 ++-
 22 files changed, 500 insertions(+), 113 deletions(-)

-- 
2.13.3



[dpdk-dev] [PATCH v3 6/7] net/mlx5: support generic tunnel offloading

2018-03-05 Thread Xueming Li
This commit adds support for generic tunnel TSO and checksum offload.
PMD will compute the inner/outer headers offset according to the
mbuf fields. Hardware will do calculation based on offsets and types.

Signed-off-by: Xueming Li 
---
 doc/guides/nics/mlx5.rst  |   8 +++
 drivers/net/mlx5/Makefile |   5 ++
 drivers/net/mlx5/mlx5.c   |  28 ++--
 drivers/net/mlx5/mlx5.h   |   1 +
 drivers/net/mlx5/mlx5_ethdev.c|   4 +-
 drivers/net/mlx5/mlx5_prm.h   |  24 +++
 drivers/net/mlx5/mlx5_rxtx.c  | 122 ++
 drivers/net/mlx5/mlx5_rxtx.h  |  98 +--
 drivers/net/mlx5/mlx5_rxtx_vec.c  |   9 +--
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h |   2 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h  |   2 +-
 drivers/net/mlx5/mlx5_txq.c   |  12 +++-
 12 files changed, 251 insertions(+), 64 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0e6e525c9..c47a7b2a9 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -334,6 +334,14 @@ Run-time configuration
 
   Enabled by default.
 
+- ``swp`` parameter [int]
+
+  A nonzero value enables TX SW parser to support genenric tunnel TSO and
+  checksum offloading. Please refer to ``DEV_TX_OFFLOAD_GENERIC_TNL_TSO``
+  and ``DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM`` for detail information.
+
+  Disabled by default.
+
 Prerequisites
 -
 
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index afda4118f..c61fc3b0e 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -135,6 +135,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
enum IBV_WQ_FLAG_RX_END_PADDING \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
+   HAVE_IBV_MLX5_MOD_SWP \
+   infiniband/mlx5dv.h \
+   enum MLX5DV_CONTEXT_MASK_SWP \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
HAVE_IBV_MLX5_MOD_MPW \
infiniband/mlx5dv.h \
enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 61cb93101..d7f699b94 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -68,6 +68,9 @@
 /* Device parameter to enable hardware Rx vector. */
 #define MLX5_RX_VEC_EN "rx_vec_en"
 
+/* Device parameter to control Tx SW parser. */
+#define MLX5_TX_SWP "swp"
+
 #ifndef HAVE_IBV_MLX5_MOD_MPW
 #define MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED (1 << 2)
 #define MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW (1 << 3)
@@ -397,6 +400,8 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
config->tx_vec_en = !!tmp;
} else if (strcmp(MLX5_RX_VEC_EN, key) == 0) {
config->rx_vec_en = !!tmp;
+   } else if (strcmp(MLX5_TX_SWP, key) == 0) {
+   config->swp = !!tmp;
} else {
WARN("%s: unknown parameter", key);
return -EINVAL;
@@ -427,6 +432,7 @@ mlx5_args(struct mlx5_dev_config *config, struct 
rte_devargs *devargs)
MLX5_TXQ_MAX_INLINE_LEN,
MLX5_TX_VEC_EN,
MLX5_RX_VEC_EN,
+   MLX5_TX_SWP,
NULL,
};
struct rte_kvargs *kvlist;
@@ -582,6 +588,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
unsigned int mps;
unsigned int cqe_comp;
unsigned int tunnel_en = 0;
+   unsigned int swp = 0;
int idx;
int i;
struct mlx5dv_context attrs_out = {0};
@@ -657,10 +664,9 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
ibv_dev = list[i];
 
DEBUG("device opened");
-   /*
-* Multi-packet send is supported by ConnectX-4 Lx PF as well
-* as all ConnectX-5 devices.
-*/
+#ifdef HAVE_IBV_MLX5_MOD_SWP
+   attrs_out.comp_mask |= MLX5DV_CONTEXT_MASK_SWP;
+#endif
 #ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
attrs_out.comp_mask |= MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS;
 #endif
@@ -677,6 +683,11 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
DEBUG("MPW isn't supported");
mps = MLX5_MPW_DISABLED;
}
+#ifdef HAVE_IBV_MLX5_MOD_SWP
+   if (attrs_out.comp_mask | MLX5DV_CONTEXT_MASK_SWP)
+   swp = attrs_out.sw_parsing_caps.sw_parsing_offloads;
+   DEBUG("SWP support: %u", swp);
+#endif
if (RTE_CACHE_LINE_SIZE == 128 &&
!(attrs_out.flags & MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP))
cqe_comp = 0;
@@ -894,6 +905,11 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
err = priv_uar_init_primary(priv);
if (err)
goto port_error;
+   if (config.swp && !swp) {
+   WARN("Tx SWP isn'

[dpdk-dev] [PATCH v3 2/7] app/testpmd: testpmd support Tx generic tunnel offloads

2018-03-05 Thread Xueming Li
"show port cap" and "csum parse tunnel" command support TX generic
tunnel offloads

Signed-off-by: Xueming Li 
---
 app/test-pmd/cmdline.c  |  9 +++--
 app/test-pmd/config.c   | 18 ++
 app/test-pmd/csumonly.c |  3 ++-
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index d1dc1de6c..4f2b31357 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -4013,6 +4013,9 @@ check_tunnel_tso_nic_support(portid_t port_id)
if (!(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_GENEVE_TNL_TSO))
printf("Warning: GENEVE TUNNEL TSO not supported therefore "
   "not enabled for port %d\n", port_id);
+   if (!(dev_info.tx_offload_capa & DEV_TX_OFFLOAD_GENERIC_TNL_TSO))
+   printf("Warning: Generic TUNNEL TSO not supported therefore "
+  "not enabled for port %d\n", port_id);
return dev_info;
 }
 
@@ -4040,13 +4043,15 @@ cmd_tunnel_tso_set_parsed(void *parsed_result,
~(DEV_TX_OFFLOAD_VXLAN_TNL_TSO |
  DEV_TX_OFFLOAD_GRE_TNL_TSO |
  DEV_TX_OFFLOAD_IPIP_TNL_TSO |
- DEV_TX_OFFLOAD_GENEVE_TNL_TSO);
+ DEV_TX_OFFLOAD_GENEVE_TNL_TSO |
+ DEV_TX_OFFLOAD_GENERIC_TNL_TSO);
printf("TSO for tunneled packets is disabled\n");
} else {
uint64_t tso_offloads = (DEV_TX_OFFLOAD_VXLAN_TNL_TSO |
 DEV_TX_OFFLOAD_GRE_TNL_TSO |
 DEV_TX_OFFLOAD_IPIP_TNL_TSO |
-DEV_TX_OFFLOAD_GENEVE_TNL_TSO);
+DEV_TX_OFFLOAD_GENEVE_TNL_TSO |
+DEV_TX_OFFLOAD_GENERIC_TNL_TSO);
 
ports[res->port_id].dev_conf.txmode.offloads |=
(tso_offloads & dev_info.tx_offload_capa);
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 4bb255c62..0e5d1b5f5 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -668,6 +668,15 @@ port_offload_cap_display(portid_t port_id)
printf("off\n");
}
 
+   if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM) {
+   printf("Generic tunnel checksum:  ");
+   if (ports[port_id].dev_conf.txmode.offloads &
+   DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM)
+   printf("on\n");
+   else
+   printf("off\n");
+   }
+
if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_TCP_TSO) {
printf("TX TCP segmentation:   ");
if (ports[port_id].dev_conf.txmode.offloads &
@@ -722,6 +731,15 @@ port_offload_cap_display(portid_t port_id)
printf("off\n");
}
 
+   if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_GENERIC_TNL_TSO) {
+   printf("Generic tunnel TSO:  ");
+   if (ports[port_id].dev_conf.txmode.offloads &
+   DEV_TX_OFFLOAD_GENERIC_TNL_TSO)
+   printf("on\n");
+   else
+   printf("off\n");
+   }
+
 }
 
 int
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 5f5ab64aa..7b2309372 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -693,7 +693,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
info.l3_len);
parse_vxlan(udp_hdr, &info, m->packet_type);
if (info.is_tunnel)
-   tx_ol_flags |= PKT_TX_TUNNEL_VXLAN;
+   tx_ol_flags |= (PKT_TX_TUNNEL_VXLAN |
+   PKT_TX_OUTER_UDP);
} else if (info.l4_proto == IPPROTO_GRE) {
struct simple_gre_hdr *gre_hdr;
 
-- 
2.13.3



[dpdk-dev] [PATCH v3 7/7] net/mlx5: allow max 192B TSO inline header length

2018-03-05 Thread Xueming Li
Change max inline header length to 192B to allow IPv6 VXLAN TSO headers
and header with options that more than 128B.

Signed-off-by: Xueming Li 
---
 drivers/net/mlx5/mlx5_defs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index c3334ca30..1fbd9d3c5 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -58,7 +58,7 @@
 #define MLX5_MAX_XSTATS 32
 
 /* Maximum Packet headers size (L2+L3+L4) for TSO. */
-#define MLX5_MAX_TSO_HEADER 128
+#define MLX5_MAX_TSO_HEADER 192
 
 /* Default minimum number of Tx queues for vectorized Tx. */
 #define MLX5_VPMD_MIN_TXQS 4
-- 
2.13.3



Re: [dpdk-dev] [PATCH 2/6] net/sfc: add support for driver-wide dynamic logging

2018-03-05 Thread Ferruh Yigit
On 1/25/2018 5:00 PM, Andrew Rybchenko wrote:
> From: Ivan Malov 
> 
> Signed-off-by: Ivan Malov 
> Signed-off-by: Andrew Rybchenko 
> Reviewed-by: Andy Moreton 

<...>

> @@ -2082,3 +2084,14 @@ RTE_PMD_REGISTER_PARAM_STRING(net_sfc_efx,
>   SFC_KVARG_STATS_UPDATE_PERIOD_MS "= "
>   SFC_KVARG_MCDI_LOGGING "=" SFC_KVARG_VALUES_BOOL " "
>   SFC_KVARG_DEBUG_INIT "=" SFC_KVARG_VALUES_BOOL);
> +
> +RTE_INIT(sfc_driver_register_logtype);
> +static void
> +sfc_driver_register_logtype(void)
> +{
> + int ret;
> +
> + ret = rte_log_register_type_and_pick_level(SFC_LOGTYPE_PREFIX "driver",
> +RTE_LOG_NOTICE);

No benefit of using rte_log_register_type_and_pick_level() here, in this stage
"opt_loglevel_list" will be empty and this will be same as rte_log_register()


Re: [dpdk-dev] [PATCH 0/6] net/sfc: implement dynamic logging

2018-03-05 Thread Ferruh Yigit
On 1/25/2018 5:00 PM, Andrew Rybchenko wrote:
> Unfortunately we're a bit late with dynamic logging implementation.
> So, it can wait for 18.05 release cycle if required.
> 
> The series adds EXPERIMENTAL EAL feature which removes dependency
> on EAL arguments processing and log types registration. It stores
> EAL loglevel arguments in the list and adds API function to register
> a new log type and pick up its value from EAL arguments.
> 
> For us it is important since we would like to be able to control
> per-device log level, e.g. pmd.net.sfc.main.:01:00.0.

It is good idea to have device level granularity in logging,
I believe other devices also would like have this, if only there would be an
easy way to apply this capability to all PMDs.

> 
> The series already follows log type names format defined recently.
> 
> Ivan Malov (6):
>   eal: register log type and pick level from EAL args
>   net/sfc: add support for driver-wide dynamic logging
>   net/sfc: add support for per-port dynamic logging
>   net/sfc: prepare to merge init logs with main log type
>   net/sfc: remove dedicated init log parameter
>   net/sfc: add dynamic log level for MCDI messages

<...>


Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/7] ethdev: fix port data reset timing

2018-03-05 Thread Ferruh Yigit
On 3/5/2018 2:52 PM, Matan Azrad wrote:
> HI
> 
> From: Ferruh Yigit, Sent: Monday, March 5, 2018 1:24 PM
>> On 1/18/2018 4:35 PM, Matan Azrad wrote:
>>> rte_eth_dev_data structure is allocated per ethdev port and can be
>>> used to get a data of the port internally.
>>>
>>> rte_eth_dev_attach_secondary tries to find the port identifier using
>>> rte_eth_dev_data name field comparison and may get an identifier of
>>> invalid port in case of this port was released by the primary process
>>> because the port release API doesn't reset the port data.
>>>
>>> So, it will be better to reset the port data in release time instead
>>> of allocation time.
>>>
>>> Move the port data reset to the port release API.
>>>
>>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple
>>> process model")
>>> Cc: sta...@dpdk.org
>>>
>>> Signed-off-by: Matan Azrad 
>>> ---
>>>  lib/librte_ether/rte_ethdev.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_ether/rte_ethdev.c
>>> b/lib/librte_ether/rte_ethdev.c index 7044159..156231c 100644
>>> --- a/lib/librte_ether/rte_ethdev.c
>>> +++ b/lib/librte_ether/rte_ethdev.c
>>> @@ -204,7 +204,6 @@ struct rte_eth_dev *
>>> return NULL;
>>> }
>>>
>>> -   memset(&rte_eth_dev_data[port_id], 0, sizeof(struct
>> rte_eth_dev_data));
>>> eth_dev = eth_dev_get(port_id);
>>> snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
>> "%s", name);
>>> eth_dev->data->port_id = port_id;
>>> @@ -252,6 +251,7 @@ struct rte_eth_dev *
>>> if (eth_dev == NULL)
>>> return -EINVAL;
>>>
>>> +   memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
>>
>> Hi Matan,
>>
>> What most of the vdev release path does is:
>>
>> eth_dev = rte_eth_dev_allocated(...)
>> rte_free(eth_dev->data->dev_private);
>> rte_free(eth_dev->data);
>> rte_eth_dev_release_port(eth_dev);
>>
>> Since eth_dev->data freed, memset() it in rte_eth_dev_release_port() will
>> be problem.
>>
>> We don't run remove path that is why we didn't hit the issue but this seems
>> problem for all virtual PMDs.
> 
> Yes, it is a problem and should be fixed:
> For vdevs which use private rte_eth_dev_data the remove order can be:
>   private_data = eth_dev->data;
>   rte_free(eth_dev->data->dev_private);
>   rte_eth_dev_release_port(eth_dev); /* The last operation working on 
> ethdev structure. */
>   rte_free(private_data);

Do we need to save "private_data"?

> 
> 
>> Also rte_eth_dev_pci_release() looks problematic now.
> 
> Yes, again, the last operation working on ethdev structure should be 
> rte_eth_dev_release_port().
> 
> So need to fix all vdevs and the rte_eth_dev_pci_release() function.
> 
> Any comments?
> 



Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/7] ethdev: fix port data reset timing

2018-03-05 Thread Matan Azrad
Hi Ferruh

From: Ferruh Yigit, Sent: Monday, March 5, 2018 5:07 PM
> On 3/5/2018 2:52 PM, Matan Azrad wrote:
> > HI
> >
> > From: Ferruh Yigit, Sent: Monday, March 5, 2018 1:24 PM
> >> On 1/18/2018 4:35 PM, Matan Azrad wrote:
> >>> rte_eth_dev_data structure is allocated per ethdev port and can be
> >>> used to get a data of the port internally.
> >>>
> >>> rte_eth_dev_attach_secondary tries to find the port identifier using
> >>> rte_eth_dev_data name field comparison and may get an identifier of
> >>> invalid port in case of this port was released by the primary
> >>> process because the port release API doesn't reset the port data.
> >>>
> >>> So, it will be better to reset the port data in release time instead
> >>> of allocation time.
> >>>
> >>> Move the port data reset to the port release API.
> >>>
> >>> Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple
> >>> process model")
> >>> Cc: sta...@dpdk.org
> >>>
> >>> Signed-off-by: Matan Azrad 
> >>> ---
> >>>  lib/librte_ether/rte_ethdev.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/librte_ether/rte_ethdev.c
> >>> b/lib/librte_ether/rte_ethdev.c index 7044159..156231c 100644
> >>> --- a/lib/librte_ether/rte_ethdev.c
> >>> +++ b/lib/librte_ether/rte_ethdev.c
> >>> @@ -204,7 +204,6 @@ struct rte_eth_dev *
> >>>   return NULL;
> >>>   }
> >>>
> >>> - memset(&rte_eth_dev_data[port_id], 0, sizeof(struct
> >> rte_eth_dev_data));
> >>>   eth_dev = eth_dev_get(port_id);
> >>>   snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
> >> "%s", name);
> >>>   eth_dev->data->port_id = port_id;
> >>> @@ -252,6 +251,7 @@ struct rte_eth_dev *
> >>>   if (eth_dev == NULL)
> >>>   return -EINVAL;
> >>>
> >>> + memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> >>
> >> Hi Matan,
> >>
> >> What most of the vdev release path does is:
> >>
> >> eth_dev = rte_eth_dev_allocated(...)
> >> rte_free(eth_dev->data->dev_private);
> >> rte_free(eth_dev->data);
> >> rte_eth_dev_release_port(eth_dev);
> >>
> >> Since eth_dev->data freed, memset() it in rte_eth_dev_release_port()
> >> will be problem.
> >>
> >> We don't run remove path that is why we didn't hit the issue but this
> >> seems problem for all virtual PMDs.
> >
> > Yes, it is a problem and should be fixed:
> > For vdevs which use private rte_eth_dev_data the remove order can be:
> > private_data = eth_dev->data;
> > rte_free(eth_dev->data->dev_private);
> > rte_eth_dev_release_port(eth_dev); /* The last operation working
> on ethdev structure. */
> > rte_free(private_data);
> 
> Do we need to save "private_data"?

Just to emphasis that eth_dev structure should not more be available after 
rte_eth_dev_release_port().
Maybe in the future rte_eth_dev_release_port() will zero eth_dev structure too 
:)

> >
> >
> >> Also rte_eth_dev_pci_release() looks problematic now.
> >
> > Yes, again, the last operation working on ethdev structure should be
> rte_eth_dev_release_port().
> >
> > So need to fix all vdevs and the rte_eth_dev_pci_release() function.
> >
> > Any comments?
> >



Re: [dpdk-dev] [PATCH] net/null: Support bulk alloc and free.

2018-03-05 Thread Ferruh Yigit
On 2/3/2018 3:11 AM, Mallesh Koujalagi wrote:
> After bulk allocation and freeing of multiple mbufs increase more than ~2%
> throughput on single core.
> 
> Signed-off-by: Mallesh Koujalagi 
> ---
>  drivers/net/null/rte_eth_null.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
> index 9385ffd..247ede0 100644
> --- a/drivers/net/null/rte_eth_null.c
> +++ b/drivers/net/null/rte_eth_null.c
> @@ -130,10 +130,11 @@ eth_null_copy_rx(void *q, struct rte_mbuf **bufs, 
> uint16_t nb_bufs)
>   return 0;
>  
>   packet_size = h->internals->packet_size;
> +
> + if (rte_pktmbuf_alloc_bulk(h->mb_pool, bufs, nb_bufs) != 0)
> + return 0;
> +
>   for (i = 0; i < nb_bufs; i++) {
> - bufs[i] = rte_pktmbuf_alloc(h->mb_pool);
> - if (!bufs[i])
> - break;
>   rte_memcpy(rte_pktmbuf_mtod(bufs[i], void *), h->dummy_packet,
>   packet_size);
>   bufs[i]->data_len = (uint16_t)packet_size;
> @@ -149,18 +150,15 @@ eth_null_copy_rx(void *q, struct rte_mbuf **bufs, 
> uint16_t nb_bufs)
>  static uint16_t
>  eth_null_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
>  {
> - int i;
>   struct null_queue *h = q;
>  
>   if ((q == NULL) || (bufs == NULL))
>   return 0;
>  
> - for (i = 0; i < nb_bufs; i++)
> - rte_pktmbuf_free(bufs[i]);
> + rte_mempool_put_bulk(bufs[0]->pool, (void **)bufs, nb_bufs);

Is it guarantied that all mbufs will be from same mempool?

> + rte_atomic64_add(&h->tx_pkts, nb_bufs);
>  
> - rte_atomic64_add(&(h->tx_pkts), i);
> -
> - return i;
> + return nb_bufs;
>  }
>  
>  static uint16_t
> 



Re: [dpdk-dev] [PATCH] net/bnx2x: reserve enough headroom for mbuf prepend

2018-03-05 Thread Ferruh Yigit
On 2/6/2018 11:21 AM, zhouyangchao wrote:

Can you please provide more information why this patch is needed?

> Signed-off-by: Yangchao Zhou 
> ---
>  drivers/net/bnx2x/bnx2x_rxtx.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/bnx2x/bnx2x_rxtx.c b/drivers/net/bnx2x/bnx2x_rxtx.c
> index a0d4ac9..d8a3225 100644
> --- a/drivers/net/bnx2x/bnx2x_rxtx.c
> +++ b/drivers/net/bnx2x/bnx2x_rxtx.c
> @@ -140,7 +140,8 @@ bnx2x_dev_rx_queue_setup(struct rte_eth_dev *dev,
>   return -ENOMEM;
>   }
>   rxq->sw_ring[idx] = mbuf;
> - rxq->rx_ring[idx] = mbuf->buf_iova;
> + rxq->rx_ring[idx] = 
> + rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
>   }
>   rxq->pkt_first_seg = NULL;
>   rxq->pkt_last_seg = NULL;
> @@ -400,7 +401,8 @@ bnx2x_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, 
> uint16_t nb_pkts)
>  
>   rx_mb = rxq->sw_ring[bd_cons];
>   rxq->sw_ring[bd_cons] = new_mb;
> - rxq->rx_ring[bd_prod] = new_mb->buf_iova;
> + rxq->rx_ring[bd_prod] = 
> + rte_cpu_to_le_64(rte_mbuf_data_iova_default(new_mb));
>  
>   rx_pref = NEXT_RX_BD(bd_cons) & MAX_RX_BD(rxq);
>   rte_prefetch0(rxq->sw_ring[rx_pref]);
> @@ -409,7 +411,7 @@ bnx2x_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, 
> uint16_t nb_pkts)
>   rte_prefetch0(&rxq->sw_ring[rx_pref]);
>   }
>  
> - rx_mb->data_off = pad;
> + rx_mb->data_off = pad + RTE_PKTMBUF_HEADROOM;
>   rx_mb->nb_segs = 1;
>   rx_mb->next = NULL;
>   rx_mb->pkt_len = rx_mb->data_len = len;
> 



Re: [dpdk-dev] [PATCH] net/null: Support bulk alloc and free.

2018-03-05 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Monday, March 5, 2018 3:25 PM
> To: Koujalagi, MalleshX ; dev@dpdk.org
> Cc: mtetsu...@gmail.com
> Subject: Re: [dpdk-dev] [PATCH] net/null: Support bulk alloc and free.
> 
> On 2/3/2018 3:11 AM, Mallesh Koujalagi wrote:
> > After bulk allocation and freeing of multiple mbufs increase more than ~2%
> > throughput on single core.
> >
> > Signed-off-by: Mallesh Koujalagi 
> > ---
> >  drivers/net/null/rte_eth_null.c | 16 +++-
> >  1 file changed, 7 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/null/rte_eth_null.c 
> > b/drivers/net/null/rte_eth_null.c
> > index 9385ffd..247ede0 100644
> > --- a/drivers/net/null/rte_eth_null.c
> > +++ b/drivers/net/null/rte_eth_null.c
> > @@ -130,10 +130,11 @@ eth_null_copy_rx(void *q, struct rte_mbuf **bufs, 
> > uint16_t nb_bufs)
> > return 0;
> >
> > packet_size = h->internals->packet_size;
> > +
> > +   if (rte_pktmbuf_alloc_bulk(h->mb_pool, bufs, nb_bufs) != 0)
> > +   return 0;
> > +
> > for (i = 0; i < nb_bufs; i++) {
> > -   bufs[i] = rte_pktmbuf_alloc(h->mb_pool);
> > -   if (!bufs[i])
> > -   break;
> > rte_memcpy(rte_pktmbuf_mtod(bufs[i], void *), h->dummy_packet,
> > packet_size);
> > bufs[i]->data_len = (uint16_t)packet_size;
> > @@ -149,18 +150,15 @@ eth_null_copy_rx(void *q, struct rte_mbuf **bufs, 
> > uint16_t nb_bufs)
> >  static uint16_t
> >  eth_null_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> >  {
> > -   int i;
> > struct null_queue *h = q;
> >
> > if ((q == NULL) || (bufs == NULL))
> > return 0;
> >
> > -   for (i = 0; i < nb_bufs; i++)
> > -   rte_pktmbuf_free(bufs[i]);
> > +   rte_mempool_put_bulk(bufs[0]->pool, (void **)bufs, nb_bufs);
> 
> Is it guarantied that all mbufs will be from same mempool?

I don't think it does, plus
rte_pktmbuf_free(mb) != rte_mempool_put_bulk(mb->pool, &mb, 1);
Konstantin

> 
> > +   rte_atomic64_add(&h->tx_pkts, nb_bufs);
> >
> > -   rte_atomic64_add(&(h->tx_pkts), i);
> > -
> > -   return i;
> > +   return nb_bufs;
> >  }
> >
> >  static uint16_t
> >



[dpdk-dev] [PATCH] vhost: maintain separate virtio features field

2018-03-05 Thread Tomasz Kulasek
There are two separate abstraction layers:
* vsocket - which represents a unix domain socket
* virtio_net - which represents a vsocket connection

There can be many connections on the same socket. vsocket provides an
API to enable/disable particular virtio features on the fly, but it's
the virtio_net that uses these features.

virtio_net used to rely on vsocket->features during feature negotiation,
breaking the layer encapsulation (and yet causing a deadlock - two locks
were being locked in a separate order). Now each virtio_net device has
it's own copy of vsocket features, created at the time of virtio_net
creation.

vsocket->features have to be still present, as features can be
enabled/disabled while no virtio_net device has been created yet.

Signed-off-by: Dariusz Stojaczyk 
Signed-off-by: Tomasz Kulasek 
---
 lib/librte_vhost/socket.c |  2 +-
 lib/librte_vhost/vhost.c  |  9 +
 lib/librte_vhost/vhost.h  |  8 +---
 lib/librte_vhost/vhost_user.c | 33 +
 4 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 83befdced..260e38dbe 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -188,7 +188,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket 
*vsocket)
return;
}
 
-   vid = vhost_new_device();
+   vid = vhost_new_device(vsocket->features);
if (vid == -1) {
goto err;
}
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index a407067e2..a307a19ed 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -256,7 +256,7 @@ reset_device(struct virtio_net *dev)
 {
uint32_t i;
 
-   dev->features = 0;
+   dev->negotiated_features = 0;
dev->protocol_features = 0;
dev->flags &= VIRTIO_DEV_BUILTIN_VIRTIO_NET;
 
@@ -269,7 +269,7 @@ reset_device(struct virtio_net *dev)
  * there is a new virtio device being attached).
  */
 int
-vhost_new_device(void)
+vhost_new_device(uint64_t features)
 {
struct virtio_net *dev;
int i;
@@ -296,6 +296,7 @@ vhost_new_device(void)
dev->vid = i;
dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET;
dev->slave_req_fd = -1;
+   dev->features = features;
 
return i;
 }
@@ -376,7 +377,7 @@ rte_vhost_get_mtu(int vid, uint16_t *mtu)
if (!(dev->flags & VIRTIO_DEV_READY))
return -EAGAIN;
 
-   if (!(dev->features & (1ULL << VIRTIO_NET_F_MTU)))
+   if (!(dev->negotiated_features & (1ULL << VIRTIO_NET_F_MTU)))
return -ENOTSUP;
 
*mtu = dev->mtu;
@@ -458,7 +459,7 @@ rte_vhost_get_negotiated_features(int vid, uint64_t 
*features)
if (!dev)
return -1;
 
-   *features = dev->features;
+   *features = dev->negotiated_features;
return 0;
 }
 
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index d947bc9e3..efbc89857 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -217,6 +217,7 @@ struct virtio_net {
/* Frontend (QEMU) memory and memory region information */
struct rte_vhost_memory *mem;
uint64_tfeatures;
+   uint64_tnegotiated_features;
uint64_tprotocol_features;
int vid;
uint32_tflags;
@@ -266,8 +267,9 @@ vhost_log_write(struct virtio_net *dev, uint64_t addr, 
uint64_t len)
 {
uint64_t page;
 
-   if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) ||
-  !dev->log_base || !len))
+   if (likely(((dev->negotiated_features &
+   (1ULL << VHOST_F_LOG_ALL)) == 0) || !dev->log_base ||
+   !len))
return;
 
if (unlikely(dev->log_size <= ((addr + len - 1) / VHOST_LOG_PAGE / 8)))
@@ -347,7 +349,7 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa, uint64_t 
size)
 
 struct virtio_net *get_device(int vid);
 
-int vhost_new_device(void);
+int vhost_new_device(uint64_t features);
 void cleanup_device(struct virtio_net *dev, int destroy);
 void reset_device(struct virtio_net *dev);
 void vhost_destroy_device(int);
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 65ee33919..818fc4263 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -132,10 +132,7 @@ vhost_user_reset_owner(struct virtio_net *dev)
 static uint64_t
 vhost_user_get_features(struct virtio_net *dev)
 {
-   uint64_t features = 0;
-
-   rte_vhost_driver_get_features(dev->ifname, &features);
-   return features;
+   return dev->features;
 }
 
 /*
@@ -146,7 +143,7 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t 
features)
 {
uint64_t vhost_features = 0;
 
-   rte_vhost_driver_get_features(dev->ifname, &vhost_features);
+   vhost_features = vhost_user_get_featu

[dpdk-dev] [PATCH] vhost: add API for getting last_idx of vrings

2018-03-05 Thread Tomasz Kulasek
vhost-net devices might keep track of last descriptors indices by
themselves, and assuming they initially start at 0, but that is not the
case for vhost-scsi. Initial last descriptor indices are set via
VHOST_USER_SET_VRING_BASE message, and we cannot possibly predict what
will they be. Setting these to vqueue->used->idx is also not an option,
because there might be some yet unprocessed requests between these and
the actual last_idx. This patch adds API for getting/setting last
descriptor indices of vrings, so that they can be synchronized between
user-device and rte_vhost.

The last_idx flow could be as following:

 * vhost start,
 * received SET_VRING_BASE msg, last_idx is set on rte_vhost side,
 * created user-device, last_idx pulled from rte_vhost,
 * requests are being processed by user-device, last_idx changes,
 * destroyed user-device, last_idx pushed to rte_vhost,
 * at this point, vrings could be recreated and another SET_VRING_BASE
   message could arrive, so last_idx would be set
 * recreated user-device, last_idx pulled from rte_vhost.


Signed-off-by: Dariusz Stojaczyk 
Signed-off-by: Tomasz Kulasek 
---
 lib/librte_vhost/rte_vhost.h | 24 
 lib/librte_vhost/vhost.c | 27 +++
 2 files changed, 51 insertions(+)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index d33206997..b9ba058d1 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -62,6 +62,9 @@ struct rte_vhost_vring {
 
int kickfd;
uint16_tsize;
+
+   uint16_tlast_avail_idx;
+   uint16_tlast_used_idx;
 };
 
 /**
@@ -434,6 +437,27 @@ int rte_vhost_vring_call(int vid, uint16_t vring_idx);
  */
 uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
 
+/**
+ * Set id of the last descriptors in avail and used guest vrings.
+ *
+ * In case user application operates directly on buffers, it should use this
+ * function on device destruction to retrieve the same values later on in 
device
+ * creation via rte_vhost_get_vhost_vring(int, uint16_t, struct 
rte_vhost_vring *)
+ *
+ * @param vid
+ *  vhost device ID
+ * @param vring_idx
+ *  vring index
+ * @param last_avail_idx
+ *  id of the last descriptor in avail ring to be set
+ * @param last_used_idx
+ *  id of the last descriptor in used ring to be set
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_set_vhost_vring_last_idx(int vid, uint16_t vring_idx,
+ uint16_t last_avail_idx, uint16_t last_used_idx);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index a407067e2..a82dc5a62 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -512,6 +512,9 @@ rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
vring->kickfd  = vq->kickfd;
vring->size= vq->size;
 
+   vring->last_avail_idx = vq->last_avail_idx;
+   vring->last_used_idx = vq->last_used_idx;
+
return 0;
 }
 
@@ -627,3 +630,27 @@ rte_vhost_rx_queue_count(int vid, uint16_t qid)
 
return *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx;
 }
+
+int
+rte_vhost_set_vhost_vring_last_idx(int vid, uint16_t vring_idx,
+ uint16_t last_avail_idx, uint16_t last_used_idx)
+{
+   struct virtio_net *dev;
+   struct vhost_virtqueue *vq;
+
+   dev = get_device(vid);
+   if (!dev)
+   return -1;
+
+   if (vring_idx >= VHOST_MAX_VRING)
+   return -1;
+
+   vq = dev->virtqueue[vring_idx];
+   if (!vq)
+   return -1;
+
+   vq->last_avail_idx = last_avail_idx;
+   vq->last_used_idx = last_used_idx;
+
+   return 0;
+}
-- 
2.14.1



[dpdk-dev] [PATCH] vhost: stop device before updating public vring data

2018-03-05 Thread Tomasz Kulasek
For now DPDK assumes that callfd, kickfd and last_idx are being set just
once during vring initialization and device cannot be running while DPDK
receives SET_VRING_KICK, SET_VRING_CALL and SET_VRING_BASE messages.
However, that assumption is wrong. For Vhost SCSI messages might arrive
at any point of time, possibly multiple times, one after another.

QEMU issues SET_VRING_CALL once during device initialization, then again
during device start. The second message will close previous callfd,
which is still being used by the user-implementation of vhost device.
This results in writing to invalid (closed) callfd.

Other messages like SET_FEATURES, SET_VRING_ADDR etc also will change
internal state of VQ or device. To prevent race condition device should
also be stopped before updateing vring data.

Signed-off-by: Dariusz Stojaczyk 
Signed-off-by: Pawel Wodkowski 
Signed-off-by: Tomasz Kulasek 
---
 lib/librte_vhost/vhost_user.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 65ee33919..3895e6edd 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -172,6 +172,10 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t 
features)
 
if (dev->notify_ops->features_changed)
dev->notify_ops->features_changed(dev->vid, features);
+   else {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
+   dev->notify_ops->destroy_device(dev->vid);
+   }
}
 
dev->features = features;
@@ -487,6 +491,12 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, 
VhostUserMsg *msg)
if (dev->mem == NULL)
return -1;
 
+   /* Remove from the data plane. */
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
+   dev->notify_ops->destroy_device(dev->vid);
+   }
+
/* addr->index refers to the queue index. The txq 1, rxq is 0. */
vq = dev->virtqueue[msg->payload.addr.index];
 
@@ -517,6 +527,12 @@ static int
 vhost_user_set_vring_base(struct virtio_net *dev,
  VhostUserMsg *msg)
 {
+   /* Remove from the data plane. */
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
+   dev->notify_ops->destroy_device(dev->vid);
+   }
+
dev->virtqueue[msg->payload.state.index]->last_used_idx  =
msg->payload.state.num;
dev->virtqueue[msg->payload.state.index]->last_avail_idx =
@@ -796,6 +812,12 @@ vhost_user_set_vring_call(struct virtio_net *dev, struct 
VhostUserMsg *pmsg)
struct vhost_vring_file file;
struct vhost_virtqueue *vq;
 
+   /* Remove from the data plane. */
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
+   dev->notify_ops->destroy_device(dev->vid);
+   }
+
file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
file.fd = VIRTIO_INVALID_EVENTFD;
@@ -818,6 +840,12 @@ vhost_user_set_vring_kick(struct virtio_net **pdev, struct 
VhostUserMsg *pmsg)
struct vhost_virtqueue *vq;
struct virtio_net *dev = *pdev;
 
+   /* Remove from the data plane. */
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
+   dev->notify_ops->destroy_device(dev->vid);
+   }
+
file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
file.fd = VIRTIO_INVALID_EVENTFD;
@@ -959,6 +987,12 @@ vhost_user_set_protocol_features(struct virtio_net *dev,
if (protocol_features & ~VHOST_USER_PROTOCOL_FEATURES)
return;
 
+   /* Remove from the data plane. */
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
+   dev->notify_ops->destroy_device(dev->vid);
+   }
+
dev->protocol_features = protocol_features;
 }
 
@@ -981,6 +1015,12 @@ vhost_user_set_log_base(struct virtio_net *dev, struct 
VhostUserMsg *msg)
return -1;
}
 
+   /* Remove from the data plane. */
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
+   dev->notify_ops->destroy_device(dev->vid);
+   }
+
size = msg->payload.log.mmap_size;
off  = msg->payload.log.mmap_offset;
RTE_LOG(INFO, VHOST_CONFIG,
-- 
2.14.1



Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

2018-03-05 Thread Adrien Mazarguil
On Mon, Feb 26, 2018 at 05:44:01PM +, Doherty, Declan wrote:
> On 13/02/2018 5:05 PM, Adrien Mazarguil wrote:
> > Hi,
> > 
> > Apologies for being late to this thread, I've read the ensuing discussion
> > (hope I didn't miss any) and also think rte_flow could be improved in
> > several ways to enable TEP support, in particular regarding the ordering of
> > actions.
> > 
> > On the other hand I'm not sure a dedicated API for TEP is needed at all. I'm
> > not convinced rte_security chose the right path and would like to avoid
> > repeating the same mistakes if possible, more below.
> > 
> > On Thu, Dec 21, 2017 at 10:21:13PM +, Doherty, Declan wrote:
> > > This RFC contains a proposal to add a new tunnel endpoint API to DPDK 
> > > that when used
> > > in conjunction with rte_flow enables the configuration of inline data 
> > > path encapsulation
> > > and decapsulation of tunnel endpoint network overlays on accelerated IO 
> > > devices.
> > > 
> > > The proposed new API would provide for the creation, destruction, and
> > > monitoring of a tunnel endpoint in supporting hw, as well as capabilities 
> > > APIs to allow the
> > > acceleration features to be discovered by applications.

> > Although I'm not convinced an opaque object is the right approach, if we
> > choose this route I suggest the much simpler:
> > 
> >   struct rte_flow_action_tep_(encap|decap) {
> >   struct rte_tep *tep;
> >   uint32_t flow_id;
> >   };
> > 
> 
> That's a fair point, the only other action that we currently had the
> encap/decap actions supporting was the Ethernet item, and going back to a
> comment from Boris having the Ethernet header separate from the tunnel is
> probably not ideal anyway. As one of our reasons for using an opaque tep
> item was to allow modification of the TEP independently of all the flows
> being carried on it. So for instance if the src or dst MAC needs to be
> modified or the output port needs to changed, the TEP itself could be
> modified.

Makes sense. I think there's now consensus that without a dedicated API, it
can be done through multiple rte_flow groups and "jump" actions targeting
them. Such actions remain to be formally defined though.

In the meantime there is an alternative approach when opaque pattern
items/actions are unavoidable: by using negative values [1].

In addition to an opaque object to use with rte_flow, a PMD could return a
PMD-specific negative value cast as enum rte_flow_{item,action}_type and
usable with the associated port ID only.

An API could even initialize a pattern item or an action object directly:

 struct rte_flow_action tep_action;
 
 if (rte_tep_create(port_id, &tep_action, ...) != 0)
  rte_panic("no!");
 /*
  * tep_action is now initialized with an opaque type and conf pointer, it
  * can be used with rte_flow_create() as part of an action list.
  */

[1] http://dpdk.org/doc/guides/prog_guide/rte_flow.html#negative-types


> > > struct rte_tep *tep = rte_tep_create(port_id, &attrs, pattern);
> > > 
> > > Once the tep context is created flows can then be directed to that 
> > > endpoint for
> > > processing. The following sections will outline how the author envisage 
> > > flow
> > > programming will work and also how TEP acceleration can be combined with 
> > > other
> > > accelerations.
> > 
> > In order to allow a single TEP context object to be shared by multiple flow
> > rules, a whole new API must be implemented and applications still have to
> > additionally create one rte_flow rule per TEP flow_id to manage. While this
> > probably results in shorter flow rule patterns and action lists, is it
> > really worth it?
> > 
> > While I understand the reasons for this approach, I'd like to push for a
> > rte_flow-only API as much as possible, I'll provide suggestions below.
> > 
> 
> Not only are the rules shorter to implement, it could help to greatly
> reduces the amount of cycles required to add flows, both in terms of the
> application marshaling the data in rte_flow patterns and the PMD parsing
> that those patterns every time a flow is added, in the case where 10k's of
> flow are getting added per second this could add a significant overhead on
> the system.

True, although only if the underlying hardware supports it; some PMDs may
still have to update each flow rule independently in order to expose such an
API. Applications can't be certain an update operation will be quick and
atomic.


> > > /** VERY IMPORTANT NOTE **/
> > > One of the core concepts of this proposal is that actions which modify the
> > > packet are defined in the order which they are to be processed. So first 
> > > decap
> > > outer ethernet header, then the outer TEP headers.
> > > I think this is not only logical from a usability point of view, it 
> > > should also
> > > simplify the logic required in PMDs to parse the desired actions.
> > 
> > This. I've been thinking about it for a very long time but never got around
> > submit a patch. Handling rte_flow 

Re: [dpdk-dev] OPDL and 18.02 Release Notes

2018-03-05 Thread Ferruh Yigit
On 2/9/2018 12:08 AM, Rosen, Rami wrote:
> Hi all,
> Following the recent announcement of DPDK 18.02-RC4, I went over
> 18.02 release notes and I have this minor query which I am not sure about:
> In the release notes:
> http://dpdk.org/doc/guides/rel_notes/release_18_02.html
> we have the following:
> ...
> The OPDL (Ordered Packet Distribution Library) eventdev
> ...
> 
> While in http://dpdk.org/dev/roadmap
> We have:
> 
> eventdev optimized packet distribution library (OPDL) driver
> ...
> 
> So I am not sure about this inconsistency -should it be "optimized" or 
> "ordered" ?

According driver documentation (doc/guides/eventdevs/opdl.rst) it is:
"Ordered Packet Distribution Library", release notes seems correct.

cc'ed maintainers.

> 
> Regards,
> Rami Rosen
> 
> 



Re: [dpdk-dev] [PATCH 37/41] net/enic: use contiguous allocation for DMA memory

2018-03-05 Thread John Daley (johndale)
Hi Anatoly,
Looks good, see inline for details.
Acked-by: John Daley 

Thanks,
John

> -Original Message-
> From: Anatoly Burakov [mailto:anatoly.bura...@intel.com]
> Sent: Saturday, March 03, 2018 5:46 AM
> To: dev@dpdk.org
> Cc: John Daley (johndale) ; Hyong Youb Kim (hyonkim)
> ; keith.wi...@intel.com; jianfeng@intel.com;
> andras.kov...@ericsson.com; laszlo.vadk...@ericsson.com;
> benjamin.wal...@intel.com; bruce.richard...@intel.com;
> tho...@monjalon.net; konstantin.anan...@intel.com;
> kuralamudhan.ramakrish...@intel.com; louise.m.d...@intel.com;
> nelio.laranje...@6wind.com; ys...@mellanox.com; peppe...@japf.ch;
> jerin.ja...@caviumnetworks.com; hemant.agra...@nxp.com;
> olivier.m...@6wind.com
> Subject: [PATCH 37/41] net/enic: use contiguous allocation for DMA memory
> 
> Signed-off-by: Anatoly Burakov 
> ---
> 
> Notes:
> It is not 100% clear that second call to memzone_reserve
> is allocating DMA memory. Corrections welcome.
The 2nd call is allocating DMA memory so I believe your patch is correct.
> 
>  drivers/net/enic/enic_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c index
> ec9d343..cb2a7ba 100644
> --- a/drivers/net/enic/enic_main.c
> +++ b/drivers/net/enic/enic_main.c
> @@ -319,7 +319,7 @@ enic_alloc_consistent(void *priv, size_t size,
>   struct enic *enic = (struct enic *)priv;
>   struct enic_memzone_entry *mze;
> 
> - rz = rte_memzone_reserve_aligned((const char *)name,
> + rz = rte_memzone_reserve_aligned_contig((const char *)name,
>size, SOCKET_ID_ANY, 0,
> ENIC_ALIGN);
>   if (!rz) {
>   pr_err("%s : Failed to allocate memory requested for %s\n",
> @@ -787,7 +787,7 @@ int enic_alloc_wq(struct enic *enic, uint16_t queue_idx,
>"vnic_cqmsg-%s-%d-%d", enic->bdf_name, queue_idx,
>   instance++);
> 
> - wq->cqmsg_rz = rte_memzone_reserve_aligned((const char *)name,
> + wq->cqmsg_rz = rte_memzone_reserve_aligned_contig((const char
> *)name,
>  sizeof(uint32_t),
>  SOCKET_ID_ANY, 0,
>  ENIC_ALIGN);
This is a send completion landing spot which is DMA'd to by the NIC so it does 
have to be contiguous. However the size is only 4 bytes so it might not matter.
> --
> 2.7.4


Re: [dpdk-dev] [PATCH 01/80] net/sfc: add missing defines for SAL annotation

2018-03-05 Thread Ferruh Yigit
On 2/20/2018 7:33 AM, Andrew Rybchenko wrote:
> Fixes: e1b944598579 ("net/sfc: build libefx")
> 
> Signed-off-by: Andrew Rybchenko 

Series applied to dpdk-next-net/master, thanks.

This was a big set, can be holding the record on number of patches.
Also thanks for your effort on splitting base driver update into multiple 
patches...



[dpdk-dev] [PATCH 2/2] event/sw: support device stop flush callback

2018-03-05 Thread Gage Eads
This commit also adds a flush callback test to the sw eventdev's selftest
suite.

Signed-off-by: Gage Eads 
---
 drivers/event/sw/sw_evdev.c  | 25 +++-
 drivers/event/sw/sw_evdev_selftest.c | 75 +++-
 2 files changed, 97 insertions(+), 3 deletions(-)

diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
index 6672fd8..4b57e5b 100644
--- a/drivers/event/sw/sw_evdev.c
+++ b/drivers/event/sw/sw_evdev.c
@@ -362,8 +362,25 @@ sw_init_qid_iqs(struct sw_evdev *sw)
 }
 
 static void
-sw_clean_qid_iqs(struct sw_evdev *sw)
+sw_flush_iq(struct rte_eventdev *dev, struct sw_iq *iq)
 {
+   struct sw_evdev *sw = sw_pmd_priv(dev);
+
+   while (iq_count(iq) > 0) {
+   struct rte_event event;
+
+   iq_dequeue_burst(sw, iq, &event, 1);
+
+   dev->dev_stop_flush(dev->data->dev_id,
+   event,
+   dev->dev_stop_flush_arg);
+   }
+}
+
+static void
+sw_clean_qid_iqs(struct rte_eventdev *dev)
+{
+   struct sw_evdev *sw = sw_pmd_priv(dev);
int i, j;
 
/* Release the IQ memory of all configured qids */
@@ -373,7 +390,11 @@ sw_clean_qid_iqs(struct sw_evdev *sw)
for (j = 0; j < SW_IQS_MAX; j++) {
if (!qid->iq[j].head)
continue;
+
+   if (dev->dev_stop_flush)
+   sw_flush_iq(dev, &qid->iq[j]);
iq_free_chunk_list(sw, qid->iq[j].head);
+
qid->iq[j].head = NULL;
}
}
@@ -702,7 +723,7 @@ static void
 sw_stop(struct rte_eventdev *dev)
 {
struct sw_evdev *sw = sw_pmd_priv(dev);
-   sw_clean_qid_iqs(sw);
+   sw_clean_qid_iqs(dev);
sw_xstats_uninit(sw);
sw->started = 0;
rte_smp_wmb();
diff --git a/drivers/event/sw/sw_evdev_selftest.c 
b/drivers/event/sw/sw_evdev_selftest.c
index 78d30e0..f59362e 100644
--- a/drivers/event/sw/sw_evdev_selftest.c
+++ b/drivers/event/sw/sw_evdev_selftest.c
@@ -28,6 +28,7 @@
 #define MAX_PORTS 16
 #define MAX_QIDS 16
 #define NUM_PACKETS (1<<18)
+#define DEQUEUE_DEPTH 128
 
 static int evdev;
 
@@ -147,7 +148,7 @@ init(struct test *t, int nb_queues, int nb_ports)
.nb_event_ports = nb_ports,
.nb_event_queue_flows = 1024,
.nb_events_limit = 4096,
-   .nb_event_port_dequeue_depth = 128,
+   .nb_event_port_dequeue_depth = DEQUEUE_DEPTH,
.nb_event_port_enqueue_depth = 128,
};
int ret;
@@ -2807,6 +2808,72 @@ holb(struct test *t) /* test to check we avoid basic 
head-of-line blocking */
return -1;
 }
 
+static void
+flush(uint8_t dev_id __rte_unused, struct rte_event event, void *arg)
+{
+   *((uint8_t *) arg) += (event.u64 == 0xCA11BACC) ? 1 : 0;
+}
+
+static int
+dev_stop_flush(struct test *t) /* test to check we can properly flush events */
+{
+   const struct rte_event new_ev = {
+   .op = RTE_EVENT_OP_NEW,
+   .u64 = 0xCA11BACC
+   /* all other fields zero */
+   };
+   struct rte_event ev = new_ev;
+   uint8_t count = 0;
+   int i;
+
+   if (init(t, 1, 1) < 0 ||
+   create_ports(t, 1) < 0 ||
+   create_atomic_qids(t, 1) < 0) {
+   printf("%d: Error initializing device\n", __LINE__);
+   return -1;
+   }
+
+   /* Link the queue so *_start() doesn't error out */
+   if (rte_event_port_link(evdev, t->port[0], NULL, NULL, 0) != 1) {
+   printf("%d: Error linking queue to port\n", __LINE__);
+   goto err;
+   }
+
+   if (rte_event_dev_start(evdev) < 0) {
+   printf("%d: Error with start call\n", __LINE__);
+   goto err;
+   }
+
+   for (i = 0; i < DEQUEUE_DEPTH + 1; i++) {
+   if (rte_event_enqueue_burst(evdev, t->port[0], &ev, 1) != 1) {
+   printf("%d: Error enqueuing events\n", __LINE__);
+   goto err;
+   }
+   }
+
+   /* Schedule the events from the port to the IQ. At least one event
+* should be remaining in the queue.
+*/
+   rte_service_run_iter_on_app_lcore(t->service_id, 1);
+
+   if (rte_event_dev_stop_flush_callback_register(evdev, flush, &count)) {
+   printf("%d: Error installing the flush callback\n", __LINE__);
+   goto err;
+   }
+
+   cleanup(t);
+
+   if (count == 0) {
+   printf("%d: Error executing the flush callback\n", __LINE__);
+   goto err;
+   }
+
+   return 0;
+err:
+   rte_event_dev_dump(evdev, stdout);
+   cleanup(t);
+   return -1;
+}
 static int
 worker_loopback_worker_fn(void *arg)
 {
@@ -3211,6 +3278,12 @@ test_sw_eventdev(void)

[dpdk-dev] [PATCH 1/2] eventdev: add device stop flush callback

2018-03-05 Thread Gage Eads
When an event device is stopped, it drains all event queues. These events
may contain pointers, so to prevent memory leaks eventdev now supports a
user-provided flush callback that is called during the queue drain process.
This callback is stored in process memory, so the callback must be
registered by any process that may call rte_event_dev_stop().

This commit also clarifies the behavior of rte_event_dev_stop().

This follows this mailing list discussion:
http://dpdk.org/ml/archives/dev/2018-January/087484.html

Signed-off-by: Gage Eads 
---
 lib/librte_eventdev/rte_eventdev.c   | 20 +++
 lib/librte_eventdev/rte_eventdev.h   | 53 ++--
 lib/librte_eventdev/rte_eventdev_version.map |  6 
 3 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eventdev/rte_eventdev.c 
b/lib/librte_eventdev/rte_eventdev.c
index 851a119..7de44f9 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -1123,6 +1123,26 @@ rte_event_dev_start(uint8_t dev_id)
return 0;
 }
 
+int
+rte_event_dev_stop_flush_callback_register(uint8_t dev_id,
+   eventdev_stop_flush_t callback, void *userdata)
+{
+   struct rte_eventdev *dev;
+
+   RTE_EDEV_LOG_DEBUG("Stop flush register dev_id=%" PRIu8, dev_id);
+
+   RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+   dev = &rte_eventdevs[dev_id];
+
+   if (callback == NULL)
+   return -EINVAL;
+
+   dev->dev_stop_flush = callback;
+   dev->dev_stop_flush_arg = userdata;
+
+   return 0;
+}
+
 void
 rte_event_dev_stop(uint8_t dev_id)
 {
diff --git a/lib/librte_eventdev/rte_eventdev.h 
b/lib/librte_eventdev/rte_eventdev.h
index b21c271..ec3497f 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -835,11 +835,23 @@ int
 rte_event_dev_start(uint8_t dev_id);
 
 /**
- * Stop an event device. The device can be restarted with a call to
- * rte_event_dev_start()
+ * Stop an event device.
+ *
+ * This function causes all queued events to be drained. While draining events
+ * out of the device, this function calls the user-provided flush callback
+ * (if one was registered) once per event.
+ *
+ * This function does not drain events from event ports; the application is
+ * responsible for flushing events from all ports before stopping the device.
+ *
+ * The device can be restarted with a call to rte_event_dev_start(). Threads
+ * that continue to enqueue/dequeue while the device is stopped, or being
+ * stopped, will result in undefined behavior.
  *
  * @param dev_id
  *   Event device identifier.
+ *
+ * @see rte_event_dev_stop_flush_callback_register()
  */
 void
 rte_event_dev_stop(uint8_t dev_id);
@@ -1115,6 +1127,12 @@ typedef uint16_t (*event_dequeue_burst_t)(void *port, 
struct rte_event ev[],
uint16_t nb_events, uint64_t timeout_ticks);
 /**< @internal Dequeue burst of events from port of a device */
 
+typedef void (*eventdev_stop_flush_t)(uint8_t dev_id, struct rte_event event,
+   void *arg);
+/**< Callback function called during rte_event_dev_stop(), invoked once per
+ * flushed event.
+ */
+
 #define RTE_EVENTDEV_NAME_MAX_LEN  (64)
 /**< @internal Max length of name of event PMD */
 
@@ -1176,6 +1194,11 @@ struct rte_eventdev {
event_dequeue_burst_t dequeue_burst;
/**< Pointer to PMD dequeue burst function. */
 
+   eventdev_stop_flush_t dev_stop_flush;
+   /**< Optional, user-provided event flush function */
+   void *dev_stop_flush_arg;
+   /**< User-provided argument for event flush function */
+
struct rte_eventdev_data *data;
/**< Pointer to device data */
const struct rte_eventdev_ops *dev_ops;
@@ -1822,6 +1845,32 @@ rte_event_dev_xstats_reset(uint8_t dev_id,
  */
 int rte_event_dev_selftest(uint8_t dev_id);
 
+/**
+ * Registers a callback function to be invoked during rte_event_dev_stop() for
+ * each flushed event. This function can be used to properly dispose of queued
+ * events, for example events containing memory pointers.
+ *
+ * The callback function is only registered for the calling process. The
+ * callback function must be registered in every process that can call
+ * rte_event_dev_stop().
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param callback
+ *   Callback function invoked once per flushed event.
+ * @param userdata
+ *   Argument supplied to callback.
+ *
+ * @return
+ *  - 0 on success.
+ *  - -EINVAL if *dev_id* is invalid or *callback* is NULL
+ *
+ * @see rte_event_dev_stop()
+ */
+int
+rte_event_dev_stop_flush_callback_register(uint8_t dev_id,
+   eventdev_stop_flush_t callback, void *userdata);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eventdev/rte_eventdev_version.map 
b/lib/librte_eventdev/rte_eventdev_version.map
index 2aef470..4396536 100644
--- a/lib/librte_eventdev/rte_eventdev_version.map
+++ b/lib/librte_eventd

Re: [dpdk-dev] [PATCH 2/4] bus/vdev: bus scan by multi-process channel

2018-03-05 Thread Tan, Jianfeng
Hi Anatoly,

> -Original Message-
> From: Burakov, Anatoly
> Sent: Monday, March 5, 2018 5:37 PM
> To: Tan, Jianfeng; dev@dpdk.org
> Cc: Richardson, Bruce; Ananyev, Konstantin; tho...@monjalon.net;
> maxime.coque...@redhat.com; Yigit, Ferruh
> Subject: Re: [PATCH 2/4] bus/vdev: bus scan by multi-process channel
> 
> On 04-Mar-18 3:30 PM, Jianfeng Tan wrote:
> > To scan the vdevs in primary, we send request to primary process
> > to obtain the names for vdevs.
> >
> > Only the name is shared from the primary. In probe(), the device
> > driver is supposed to locate (or request more) the detail
> > information from the primary.
> >
> > Signed-off-by: Jianfeng Tan 
> > ---
> 
> Is there much point in having private vdevs? Granted, i'm not exactly a
> heavy user of vdev's, but to me this would seem like a way to introduce
> more confusion. How do i tell which devices are shared between
> processes, and which are private to one process? Can i control which one
> do i get? To me it would seem like it would be better to just switch all
> vdevs to being shared.

Yes, that’s the final target: to make every vdev shared between primary and 
secondary process.

However, now most kinds of the vdevs do not support multi-process. For those 
devices,

- If they are firstly probed in primary, then we will share the 
rte_eth_dev_data to the secondary, so that the secondary can get stats or pdump 
the port.
- If they are firstly probed in secondary, considering it's mostly used by the 
secondary process, so we will allocate the "port id" exclusively, and keep it 
in that secondary process privately.

Thanks,
Jianfeng


[dpdk-dev] [RFC 1/4] drivers/bus/ifpga:Intel FPGA Bus Lib Code

2018-03-05 Thread Rosen Xu
Signed-off-by: Rosen Xu 
---
 drivers/bus/ifpga/Makefile  |  64 
 drivers/bus/ifpga/ifpga_bus.c   | 527 
 drivers/bus/ifpga/ifpga_common.c| 168 +
 drivers/bus/ifpga/ifpga_common.h|  46 +++
 drivers/bus/ifpga/ifpga_logs.h  |  59 
 drivers/bus/ifpga/rte_bus_ifpga.h   | 153 
 drivers/bus/ifpga/rte_bus_ifpga_version.map |   8 +
 7 files changed, 1025 insertions(+)
 create mode 100644 drivers/bus/ifpga/Makefile
 create mode 100644 drivers/bus/ifpga/ifpga_bus.c
 create mode 100644 drivers/bus/ifpga/ifpga_common.c
 create mode 100644 drivers/bus/ifpga/ifpga_common.h
 create mode 100644 drivers/bus/ifpga/ifpga_logs.h
 create mode 100644 drivers/bus/ifpga/rte_bus_ifpga.h
 create mode 100644 drivers/bus/ifpga/rte_bus_ifpga_version.map

diff --git a/drivers/bus/ifpga/Makefile b/drivers/bus/ifpga/Makefile
new file mode 100644
index 000..c71f186
--- /dev/null
+++ b/drivers/bus/ifpga/Makefile
@@ -0,0 +1,64 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_bus_ifpga.a
+LIBABIVER := 1
+EXPORT_MAP := rte_bus_ifpga_version.map
+
+ifeq ($(CONFIG_RTE_LIBRTE_DPAA2_DEBUG_INIT),y)
+CFLAGS += -O0 -g
+CFLAGS += "-Wno-error"
+else
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+endif
+
+CFLAGS += -I$(RTE_SDK)/drivers/bus/ifpga
+CFLAGS += -I$(RTE_SDK)/drivers/bus/pci
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common
+#CFLAGS += -I$(RTE_SDK)/lib/librte_rawdev
+#LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring -lrte_rawdev
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+#LDLIBS += -lrte_ethdev
+
+VPATH += $(SRCDIR)/base
+
+SRCS-y += \
+ifpga_bus.c \
+ifpga_common.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c
new file mode 100644
index 000..382d550
--- /dev/null
+++ b/drivers/bus/ifpga/ifpga_bus.c
@@ -0,0 +1,527 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright 2013-2014 6WIND S.A.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,

[dpdk-dev] [RFC 3/4] lib/librte_eal/common: Add Intel FPGA Bus Second Scan, it should be scanned after PCI Bus

2018-03-05 Thread Rosen Xu
Signed-off-by: Rosen Xu 
---
 lib/librte_eal/common/eal_common_bus.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_bus.c 
b/lib/librte_eal/common/eal_common_bus.c
index 3e022d5..74bfa15 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -70,15 +70,27 @@ struct rte_bus_list rte_bus_list =
 rte_bus_scan(void)
 {
int ret;
-   struct rte_bus *bus = NULL;
+   struct rte_bus *bus = NULL, *ifpga_bus = NULL;
 
TAILQ_FOREACH(bus, &rte_bus_list, next) {
+   if (!strcmp(bus->name, "ifpga")) {
+   ifpga_bus = bus;
+   continue;
+   }
+   
ret = bus->scan();
if (ret)
RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
bus->name);
}
 
+   if (ifpga_bus) {
+   ret = ifpga_bus->scan();
+   if (ret)
+   RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
+   ifpga_bus->name);
+   }
+
return 0;
 }
 
-- 
1.8.3.1



[dpdk-dev] [RFC 0/4] Intel FPGA Bus

2018-03-05 Thread Rosen Xu
With Partial Reconfigure(PR) parts of Bitstream, Field Programmable Gate 
Array(FPGA) not only 
provides one kinds of accelerator but also provides many types of accelerators 
at the same time.

How DPDK fully support FPGA?
 - We use Rawdev to provide FPGA PR
 - DPDK Driver will not bind to PCI Device it will bind to FPGA 
Partial-Bitstream(AFU,Accelerated Function Unit)
 - For the new Device scan, driver probe, we involve Intel FPGA Bus Module

This patchset is base on v18.02.

Rosen Xu (4):
  drivers/bus/ifpga:Intel FPGA Bus Lib Code
  lib/librte_eal/common:Add Intel FPGA Bus Running Command Parse Code
  lib/librte_eal/common: Add Intel FPGA Bus Second Scan, it should be
scanned after PCI Bus
  drivers/raw/ifpga_rawdev: Rawdev for Intel FPGA Device, it's a PCI
Driver of FPGA Device Manager

 drivers/bus/ifpga/Makefile |  64 +++
 drivers/bus/ifpga/ifpga_bus.c  | 527 +
 drivers/bus/ifpga/ifpga_common.c   | 168 +++
 drivers/bus/ifpga/ifpga_common.h   |  46 ++
 drivers/bus/ifpga/ifpga_logs.h |  59 +++
 drivers/bus/ifpga/rte_bus_ifpga.h  | 153 ++
 drivers/bus/ifpga/rte_bus_ifpga_version.map|   8 +
 drivers/raw/ifpga_rawdev/Makefile  |  59 +++
 drivers/raw/ifpga_rawdev/ifpga_rawdev.c| 343 ++
 drivers/raw/ifpga_rawdev/ifpga_rawdev.h| 109 +
 drivers/raw/ifpga_rawdev/ifpga_rawdev_example.c| 121 +
 .../ifpga_rawdev/rte_pmd_ifpga_rawdev_version.map  |   4 +
 lib/librte_eal/common/eal_common_bus.c |  14 +-
 lib/librte_eal/common/eal_common_options.c |   8 +-
 lib/librte_eal/common/eal_options.h|   2 +
 15 files changed, 1683 insertions(+), 2 deletions(-)
 create mode 100644 drivers/bus/ifpga/Makefile
 create mode 100644 drivers/bus/ifpga/ifpga_bus.c
 create mode 100644 drivers/bus/ifpga/ifpga_common.c
 create mode 100644 drivers/bus/ifpga/ifpga_common.h
 create mode 100644 drivers/bus/ifpga/ifpga_logs.h
 create mode 100644 drivers/bus/ifpga/rte_bus_ifpga.h
 create mode 100644 drivers/bus/ifpga/rte_bus_ifpga_version.map
 create mode 100644 drivers/raw/ifpga_rawdev/Makefile
 create mode 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev.c
 create mode 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev.h
 create mode 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev_example.c
 create mode 100644 drivers/raw/ifpga_rawdev/rte_pmd_ifpga_rawdev_version.map

-- 
1.8.3.1



[dpdk-dev] [RFC 4/4] drivers/raw/ifpga_rawdev: Rawdev for Intel FPGA Device, it's a PCI Driver of FPGA Device Manager

2018-03-05 Thread Rosen Xu
Signed-off-by: Rosen Xu 
---
 drivers/raw/ifpga_rawdev/Makefile  |  59 
 drivers/raw/ifpga_rawdev/ifpga_rawdev.c| 343 +
 drivers/raw/ifpga_rawdev/ifpga_rawdev.h| 109 +++
 drivers/raw/ifpga_rawdev/ifpga_rawdev_example.c| 121 
 .../ifpga_rawdev/rte_pmd_ifpga_rawdev_version.map  |   4 +
 5 files changed, 636 insertions(+)
 create mode 100644 drivers/raw/ifpga_rawdev/Makefile
 create mode 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev.c
 create mode 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev.h
 create mode 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev_example.c
 create mode 100644 drivers/raw/ifpga_rawdev/rte_pmd_ifpga_rawdev_version.map

diff --git a/drivers/raw/ifpga_rawdev/Makefile 
b/drivers/raw/ifpga_rawdev/Makefile
new file mode 100644
index 000..3166fe2
--- /dev/null
+++ b/drivers/raw/ifpga_rawdev/Makefile
@@ -0,0 +1,59 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_ifpga_rawdev.a
+
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -I$(RTE_SDK)/drivers/bus/ifpga
+CFLAGS += -I$(RTE_SDK)/drivers/raw/ifpga_rawdev
+LDLIBS += -lrte_eal
+LDLIBS += -lrte_rawdev
+LDLIBS += -lrte_bus_vdev
+LDLIBS += -lrte_kvargs
+
+EXPORT_MAP := rte_pmd_ifpga_rawdev_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV) += ifpga_rawdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV) += ifpga_rawdev_example.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/raw/ifpga_rawdev/ifpga_rawdev.c 
b/drivers/raw/ifpga_rawdev/ifpga_rawdev.c
new file mode 100644
index 000..6046711
--- /dev/null
+++ b/drivers/raw/ifpga_rawdev/ifpga_rawdev.c
@@ -0,0 +1,343 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 NXP.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of NXP nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARI

[dpdk-dev] [RFC 2/4] lib/librte_eal/common:Add Intel FPGA Bus Running Command Parse Code

2018-03-05 Thread Rosen Xu
Signed-off-by: Rosen Xu 
---
 lib/librte_eal/common/eal_common_options.c | 8 +++-
 lib/librte_eal/common/eal_options.h| 2 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 9f2f8d2..1158d21 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -73,6 +73,7 @@
{OPT_VDEV,  1, NULL, OPT_VDEV_NUM },
{OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
{OPT_VMWARE_TSC_MAP,0, NULL, OPT_VMWARE_TSC_MAP_NUM   },
+   {OPT_IFPGA, 1, NULL, OPT_IFPGA_NUM},
{0, 0, NULL, 0}
 };
 
@@ -1160,7 +1161,12 @@ static int xdigit2val(unsigned char c)
 
core_parsed = LCORE_OPT_MAP;
break;
-
+case OPT_IFPGA_NUM:
+   if (eal_option_device_add(RTE_DEVTYPE_VIRTUAL,
+   optarg) < 0) {
+   return -1;
+   }
+   break;
/* don't know what to do, leave this to caller */
default:
return 1;
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index e86c711..bdbb2c4 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -55,6 +55,8 @@ enum {
OPT_VFIO_INTR_NUM,
 #define OPT_VMWARE_TSC_MAP"vmware-tsc-map"
OPT_VMWARE_TSC_MAP_NUM,
+#define OPT_IFPGA  "ifpga"
+   OPT_IFPGA_NUM,
OPT_LONG_MAX_NUM
 };
 
-- 
1.8.3.1



[dpdk-dev] [PATCH v2 01/10] net/enic: remove 'extern' in .h file function declarations

2018-03-05 Thread John Daley
Signed-off-by: John Daley 
Reviewed-by: Hyong Youb Kim 
---
 drivers/net/enic/enic.h | 80 -
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index c083985ee..e88af6bc9 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -220,54 +220,54 @@ enic_ring_incr(uint32_t n_descriptors, uint32_t idx)
return idx;
 }
 
-extern void enic_fdir_stats_get(struct enic *enic,
-   struct rte_eth_fdir_stats *stats);
-extern int enic_fdir_add_fltr(struct enic *enic,
-   struct rte_eth_fdir_filter *params);
-extern int enic_fdir_del_fltr(struct enic *enic,
-   struct rte_eth_fdir_filter *params);
-extern void enic_free_wq(void *txq);
-extern int enic_alloc_intr_resources(struct enic *enic);
-extern int enic_setup_finish(struct enic *enic);
-extern int enic_alloc_wq(struct enic *enic, uint16_t queue_idx,
-   unsigned int socket_id, uint16_t nb_desc);
-extern void enic_start_wq(struct enic *enic, uint16_t queue_idx);
-extern int enic_stop_wq(struct enic *enic, uint16_t queue_idx);
-extern void enic_start_rq(struct enic *enic, uint16_t queue_idx);
-extern int enic_stop_rq(struct enic *enic, uint16_t queue_idx);
-extern void enic_free_rq(void *rxq);
-extern int enic_alloc_rq(struct enic *enic, uint16_t queue_idx,
-   unsigned int socket_id, struct rte_mempool *mp,
-   uint16_t nb_desc, uint16_t free_thresh);
-extern int enic_set_rss_nic_cfg(struct enic *enic);
-extern int enic_set_vnic_res(struct enic *enic);
-extern int enic_enable(struct enic *enic);
-extern int enic_disable(struct enic *enic);
-extern void enic_remove(struct enic *enic);
-extern int enic_get_link_status(struct enic *enic);
-extern int enic_dev_stats_get(struct enic *enic,
-   struct rte_eth_stats *r_stats);
-extern void enic_dev_stats_clear(struct enic *enic);
-extern void enic_add_packet_filter(struct enic *enic);
+void enic_fdir_stats_get(struct enic *enic,
+struct rte_eth_fdir_stats *stats);
+int enic_fdir_add_fltr(struct enic *enic,
+  struct rte_eth_fdir_filter *params);
+int enic_fdir_del_fltr(struct enic *enic,
+  struct rte_eth_fdir_filter *params);
+void enic_free_wq(void *txq);
+int enic_alloc_intr_resources(struct enic *enic);
+int enic_setup_finish(struct enic *enic);
+int enic_alloc_wq(struct enic *enic, uint16_t queue_idx,
+ unsigned int socket_id, uint16_t nb_desc);
+void enic_start_wq(struct enic *enic, uint16_t queue_idx);
+int enic_stop_wq(struct enic *enic, uint16_t queue_idx);
+void enic_start_rq(struct enic *enic, uint16_t queue_idx);
+int enic_stop_rq(struct enic *enic, uint16_t queue_idx);
+void enic_free_rq(void *rxq);
+int enic_alloc_rq(struct enic *enic, uint16_t queue_idx,
+ unsigned int socket_id, struct rte_mempool *mp,
+ uint16_t nb_desc, uint16_t free_thresh);
+int enic_set_rss_nic_cfg(struct enic *enic);
+int enic_set_vnic_res(struct enic *enic);
+int enic_enable(struct enic *enic);
+int enic_disable(struct enic *enic);
+void enic_remove(struct enic *enic);
+int enic_get_link_status(struct enic *enic);
+int enic_dev_stats_get(struct enic *enic,
+  struct rte_eth_stats *r_stats);
+void enic_dev_stats_clear(struct enic *enic);
+void enic_add_packet_filter(struct enic *enic);
 int enic_set_mac_address(struct enic *enic, uint8_t *mac_addr);
 void enic_del_mac_address(struct enic *enic, int mac_index);
-extern unsigned int enic_cleanup_wq(struct enic *enic, struct vnic_wq *wq);
-extern void enic_send_pkt(struct enic *enic, struct vnic_wq *wq,
- struct rte_mbuf *tx_pkt, unsigned short len,
- uint8_t sop, uint8_t eop, uint8_t cq_entry,
- uint16_t ol_flags, uint16_t vlan_tag);
-
-extern void enic_post_wq_index(struct vnic_wq *wq);
-extern int enic_probe(struct enic *enic);
-extern int enic_clsf_init(struct enic *enic);
-extern void enic_clsf_destroy(struct enic *enic);
+unsigned int enic_cleanup_wq(struct enic *enic, struct vnic_wq *wq);
+void enic_send_pkt(struct enic *enic, struct vnic_wq *wq,
+  struct rte_mbuf *tx_pkt, unsigned short len,
+  uint8_t sop, uint8_t eop, uint8_t cq_entry,
+  uint16_t ol_flags, uint16_t vlan_tag);
+
+void enic_post_wq_index(struct vnic_wq *wq);
+int enic_probe(struct enic *enic);
+int enic_clsf_init(struct enic *enic);
+void enic_clsf_destroy(struct enic *enic);
 uint16_t enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);
 uint16_t enic_dummy_recv_pkts(void *rx_queue,
  struct rte_mbuf **rx_pkts,
  uint16_t nb_pkts);
 uint16_t enic_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
-  uint16_t nb_pkts);
+   uint16_t nb_pkts);
 uint16_t enic_

[dpdk-dev] [PATCH v2 04/10] net/enic: remove the VLAN filter handler

2018-03-05 Thread John Daley
From: Hyong Youb Kim 

VIC does not support VLAN filtering at the moment. The firmware does
accept the filter add/del commands and returns success. But, they are
no-ops. To avoid confusion, remove the filter set handler so the app
sees an error instead of silent failure.

Also during the device configure time, enicpmd_vlan_offload_set would
not print a warning message about unsupported VLAN filtering, because
the caller specifies only ETH_VLAN_STRIP_MASK. This is wrong, as we
should attempt to apply all requested offloads at the configure
time. So, pass all VLAN offload masks, which triggers a warning
message about VLAN filtering, if requested.

Finally, enicpmd_vlan_offload_set should check both mask and
rxmode.offloads, not just mask.

Signed-off-by: Hyong Youb Kim 
Reviewed-by: John Daley 
---
 drivers/net/enic/enic_ethdev.c | 34 ++
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index bdbaf4cdf..e5523e311 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -318,40 +318,29 @@ static int enicpmd_dev_rx_queue_setup(struct rte_eth_dev 
*eth_dev,
return enicpmd_dev_setup_intr(enic);
 }
 
-static int enicpmd_vlan_filter_set(struct rte_eth_dev *eth_dev,
-   uint16_t vlan_id, int on)
-{
-   struct enic *enic = pmd_priv(eth_dev);
-   int err;
-
-   ENICPMD_FUNC_TRACE();
-   if (on)
-   err = enic_add_vlan(enic, vlan_id);
-   else
-   err = enic_del_vlan(enic, vlan_id);
-   return err;
-}
-
 static int enicpmd_vlan_offload_set(struct rte_eth_dev *eth_dev, int mask)
 {
struct enic *enic = pmd_priv(eth_dev);
+   uint64_t offloads;
 
ENICPMD_FUNC_TRACE();
 
+   offloads = eth_dev->data->dev_conf.rxmode.offloads;
if (mask & ETH_VLAN_STRIP_MASK) {
-   if (eth_dev->data->dev_conf.rxmode.offloads &
-   DEV_RX_OFFLOAD_VLAN_STRIP)
+   if (offloads & DEV_RX_OFFLOAD_VLAN_STRIP)
enic->ig_vlan_strip_en = 1;
else
enic->ig_vlan_strip_en = 0;
}
 
-   if (mask & ETH_VLAN_FILTER_MASK) {
+   if ((mask & ETH_VLAN_FILTER_MASK) &&
+   (offloads & DEV_RX_OFFLOAD_VLAN_FILTER)) {
dev_warning(enic,
"Configuration of VLAN filter is not supported\n");
}
 
-   if (mask & ETH_VLAN_EXTEND_MASK) {
+   if ((mask & ETH_VLAN_EXTEND_MASK) &&
+   (offloads & DEV_RX_OFFLOAD_VLAN_EXTEND)) {
dev_warning(enic,
"Configuration of extended VLAN is not supported\n");
}
@@ -362,6 +351,7 @@ static int enicpmd_vlan_offload_set(struct rte_eth_dev 
*eth_dev, int mask)
 static int enicpmd_dev_configure(struct rte_eth_dev *eth_dev)
 {
int ret;
+   int mask;
struct enic *enic = pmd_priv(eth_dev);
 
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
@@ -376,7 +366,11 @@ static int enicpmd_dev_configure(struct rte_eth_dev 
*eth_dev)
 
enic->hw_ip_checksum = !!(eth_dev->data->dev_conf.rxmode.offloads &
  DEV_RX_OFFLOAD_CHECKSUM);
-   ret = enicpmd_vlan_offload_set(eth_dev, ETH_VLAN_STRIP_MASK);
+   /* All vlan offload masks to apply the current settings */
+   mask = ETH_VLAN_STRIP_MASK |
+   ETH_VLAN_FILTER_MASK |
+   ETH_VLAN_EXTEND_MASK;
+   ret = enicpmd_vlan_offload_set(eth_dev, mask);
if (ret) {
dev_err(enic, "Failed to configure VLAN offloads\n");
return ret;
@@ -710,7 +704,7 @@ static const struct eth_dev_ops enicpmd_eth_dev_ops = {
.dev_infos_get= enicpmd_dev_info_get,
.dev_supported_ptypes_get = enicpmd_dev_supported_ptypes_get,
.mtu_set  = enicpmd_mtu_set,
-   .vlan_filter_set  = enicpmd_vlan_filter_set,
+   .vlan_filter_set  = NULL,
.vlan_tpid_set= NULL,
.vlan_offload_set = enicpmd_vlan_offload_set,
.vlan_strip_queue_set = NULL,
-- 
2.16.2



[dpdk-dev] [PATCH v2 00/10] enic patchset

2018-03-05 Thread John Daley
v2: rebase, submit as patchset instead of individual patches so they
apply correctly.

Hyong Youb Kim (9):
  net/enic: allow the user to change RSS settings
  net/enic: heed the requested max Rx packet size
  net/enic: remove the VLAN filter handler
  net/enic: add Rx/Tx queue configuration getters
  net/enic: allocate stats DMA buffer upfront during probe
  net/enic: support Rx queue interrupts
  doc: describe Rx bytes counter behavior for enic
  net/enic: use memcpy to avoid strict aliasing warnings
  net/enic: support for meson

John Daley (1):
  net/enic: remove 'extern' in .h file function declarations

 doc/guides/nics/enic.rst  |  16 +-
 doc/guides/nics/features/enic.ini |   3 +
 drivers/net/enic/base/vnic_dev.c  |  24 ++-
 drivers/net/enic/base/vnic_dev.h  |   1 +
 drivers/net/enic/enic.h   | 120 +++-
 drivers/net/enic/enic_clsf.c  |  21 +--
 drivers/net/enic/enic_ethdev.c| 258 ++
 drivers/net/enic/enic_main.c  | 373 ++
 drivers/net/enic/enic_res.c   |  23 ++-
 drivers/net/enic/enic_res.h   |   6 +
 drivers/net/enic/meson.build  |  19 ++
 drivers/net/meson.build   |   2 +-
 12 files changed, 686 insertions(+), 180 deletions(-)
 create mode 100644 drivers/net/enic/meson.build

-- 
2.16.2



[dpdk-dev] [PATCH v2 05/10] net/enic: add Rx/Tx queue configuration getters

2018-03-05 Thread John Daley
From: Hyong Youb Kim 

Signed-off-by: Hyong Youb Kim 
Reviewed-by: John Daley 
---
 drivers/net/enic/enic_ethdev.c | 76 --
 1 file changed, 65 insertions(+), 11 deletions(-)

diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index e5523e311..6dd72729e 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -39,6 +39,19 @@ static const struct rte_pci_id pci_id_enic_map[] = {
{.vendor_id = 0, /* sentinel */},
 };
 
+#define ENIC_TX_OFFLOAD_CAPA ( \
+   DEV_TX_OFFLOAD_VLAN_INSERT |\
+   DEV_TX_OFFLOAD_IPV4_CKSUM  |\
+   DEV_TX_OFFLOAD_UDP_CKSUM   |\
+   DEV_TX_OFFLOAD_TCP_CKSUM   |\
+   DEV_TX_OFFLOAD_TCP_TSO)
+
+#define ENIC_RX_OFFLOAD_CAPA ( \
+   DEV_RX_OFFLOAD_VLAN_STRIP | \
+   DEV_RX_OFFLOAD_IPV4_CKSUM | \
+   DEV_RX_OFFLOAD_UDP_CKSUM  | \
+   DEV_RX_OFFLOAD_TCP_CKSUM)
+
 RTE_INIT(enicpmd_init_log);
 static void
 enicpmd_init_log(void)
@@ -473,17 +486,8 @@ static void enicpmd_dev_info_get(struct rte_eth_dev 
*eth_dev,
 */
device_info->max_rx_pktlen = enic_mtu_to_max_rx_pktlen(enic->max_mtu);
device_info->max_mac_addrs = ENIC_MAX_MAC_ADDR;
-   device_info->rx_offload_capa =
-   DEV_RX_OFFLOAD_VLAN_STRIP |
-   DEV_RX_OFFLOAD_IPV4_CKSUM |
-   DEV_RX_OFFLOAD_UDP_CKSUM  |
-   DEV_RX_OFFLOAD_TCP_CKSUM;
-   device_info->tx_offload_capa =
-   DEV_TX_OFFLOAD_VLAN_INSERT |
-   DEV_TX_OFFLOAD_IPV4_CKSUM  |
-   DEV_TX_OFFLOAD_UDP_CKSUM   |
-   DEV_TX_OFFLOAD_TCP_CKSUM   |
-   DEV_TX_OFFLOAD_TCP_TSO;
+   device_info->rx_offload_capa = ENIC_RX_OFFLOAD_CAPA;
+   device_info->tx_offload_capa = ENIC_TX_OFFLOAD_CAPA;
device_info->default_rxconf = (struct rte_eth_rxconf) {
.rx_free_thresh = ENIC_DEFAULT_RX_FREE_THRESH
};
@@ -686,6 +690,54 @@ static int enicpmd_dev_rss_hash_conf_get(struct 
rte_eth_dev *dev,
return 0;
 }
 
+static void enicpmd_dev_rxq_info_get(struct rte_eth_dev *dev,
+uint16_t rx_queue_id,
+struct rte_eth_rxq_info *qinfo)
+{
+   struct enic *enic = pmd_priv(dev);
+   struct vnic_rq *rq_sop;
+   struct vnic_rq *rq_data;
+   struct rte_eth_rxconf *conf;
+   uint16_t sop_queue_idx;
+   uint16_t data_queue_idx;
+
+   ENICPMD_FUNC_TRACE();
+   sop_queue_idx = enic_rte_rq_idx_to_sop_idx(rx_queue_id);
+   data_queue_idx = enic_rte_rq_idx_to_data_idx(rx_queue_id);
+   rq_sop = &enic->rq[sop_queue_idx];
+   rq_data = &enic->rq[data_queue_idx]; /* valid if data_queue_enable */
+   qinfo->mp = rq_sop->mp;
+   qinfo->scattered_rx = rq_sop->data_queue_enable;
+   qinfo->nb_desc = rq_sop->ring.desc_count;
+   if (qinfo->scattered_rx)
+   qinfo->nb_desc += rq_data->ring.desc_count;
+   conf = &qinfo->conf;
+   memset(conf, 0, sizeof(*conf));
+   conf->rx_free_thresh = rq_sop->rx_free_thresh;
+   conf->rx_drop_en = 1;
+   /*
+* Except VLAN stripping (port setting), all the checksum offloads
+* are always enabled.
+*/
+   conf->offloads = ENIC_RX_OFFLOAD_CAPA;
+   if (!enic->ig_vlan_strip_en)
+   conf->offloads &= ~DEV_RX_OFFLOAD_VLAN_STRIP;
+   /* rx_thresh and other fields are not applicable for enic */
+}
+
+static void enicpmd_dev_txq_info_get(struct rte_eth_dev *dev,
+__rte_unused uint16_t tx_queue_id,
+struct rte_eth_txq_info *qinfo)
+{
+   struct enic *enic = pmd_priv(dev);
+
+   ENICPMD_FUNC_TRACE();
+   qinfo->nb_desc = enic->config.wq_desc_count;
+   memset(&qinfo->conf, 0, sizeof(qinfo->conf));
+   qinfo->conf.offloads = ENIC_TX_OFFLOAD_CAPA; /* not configurable */
+   /* tx_thresh, and all the other fields are not applicable for enic */
+}
+
 static const struct eth_dev_ops enicpmd_eth_dev_ops = {
.dev_configure= enicpmd_dev_configure,
.dev_start= enicpmd_dev_start,
@@ -718,6 +770,8 @@ static const struct eth_dev_ops enicpmd_eth_dev_ops = {
.rx_descriptor_done   = NULL,
.tx_queue_setup   = enicpmd_dev_tx_queue_setup,
.tx_queue_release = enicpmd_dev_tx_queue_release,
+   .rxq_info_get = enicpmd_dev_rxq_info_get,
+   .txq_info_get = enicpmd_dev_txq_info_get,
.dev_led_on   = NULL,
.dev_led_off  = NULL,
.flow_ctrl_get= NULL,
-- 
2.16.2



[dpdk-dev] [PATCH v2 03/10] net/enic: heed the requested max Rx packet size

2018-03-05 Thread John Daley
From: Hyong Youb Kim 

Currently, enic completely ignores the requested max Rx packet size
(rxmode.max_rx_pkt_len). The desired behavior is that the NIC hardware
drops packets larger than the requested size, even though they are
still smaller than MTU.

Cisco VIC does not have such a feature. But, we can accomplish a
similar (not same) effect by reducing the size of posted receive
buffers. Packets larger than the posted size get truncated, and the
receive handler drops them. This is also how the kernel enic driver
enforces the Rx side MTU.

This workaround works only when scatter mode is *not* used. When
scatter is used, there is currently no way to support
rxmode.max_rx_pkt_len, as the NIC always receives packets up to MTU.

For posterity, add a copious amount of comments regarding the
hardware's drop/receive behavior with respect to max/current MTU.

Signed-off-by: Hyong Youb Kim 
Reviewed-by: John Daley 
---
 doc/guides/nics/enic.rst   |  1 +
 drivers/net/enic/enic.h|  7 ++
 drivers/net/enic/enic_ethdev.c |  9 +++-
 drivers/net/enic/enic_main.c   | 49 --
 4 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/doc/guides/nics/enic.rst b/doc/guides/nics/enic.rst
index 4dffce1a6..0e655e9e3 100644
--- a/doc/guides/nics/enic.rst
+++ b/doc/guides/nics/enic.rst
@@ -371,6 +371,7 @@ Known bugs and unsupported features in this release
 - Setting of extended VLAN
 - UDP RSS hashing
 - MTU update only works if Scattered Rx mode is disabled
+- Maximum receive packet length is ignored if Scattered Rx mode is used
 
 Prerequisites
 -
diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index d29939c94..1b3813a58 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -162,6 +162,13 @@ struct enic {
union vnic_rss_cpu rss_cpu;
 };
 
+/* Compute ethdev's max packet size from MTU */
+static inline uint32_t enic_mtu_to_max_rx_pktlen(uint32_t mtu)
+{
+   /* ethdev max size includes eth and crc whereas NIC MTU does not */
+   return mtu + ETHER_HDR_LEN + ETHER_CRC_LEN;
+}
+
 /* Get the CQ index from a Start of Packet(SOP) RQ index */
 static inline unsigned int enic_sop_rq_idx_to_cq_idx(unsigned int sop_idx)
 {
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index cbab7029b..bdbaf4cdf 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -470,7 +470,14 @@ static void enicpmd_dev_info_get(struct rte_eth_dev 
*eth_dev,
device_info->max_rx_queues = enic->conf_rq_count / 2;
device_info->max_tx_queues = enic->conf_wq_count;
device_info->min_rx_bufsize = ENIC_MIN_MTU;
-   device_info->max_rx_pktlen = enic->max_mtu + ETHER_HDR_LEN + 4;
+   /* "Max" mtu is not a typo. HW receives packet sizes up to the
+* max mtu regardless of the current mtu (vNIC's mtu). vNIC mtu is
+* a hint to the driver to size receive buffers accordingly so that
+* larger-than-vnic-mtu packets get truncated.. For DPDK, we let
+* the user decide the buffer size via rxmode.max_rx_pkt_len, basically
+* ignoring vNIC mtu.
+*/
+   device_info->max_rx_pktlen = enic_mtu_to_max_rx_pktlen(enic->max_mtu);
device_info->max_mac_addrs = ENIC_MAX_MAC_ADDR;
device_info->rx_offload_capa =
DEV_RX_OFFLOAD_VLAN_STRIP |
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index f00e816a1..d4f478b5e 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -266,6 +266,8 @@ enic_alloc_rx_queue_mbufs(struct enic *enic, struct vnic_rq 
*rq)
struct rq_enet_desc *rqd = rq->ring.descs;
unsigned i;
dma_addr_t dma_addr;
+   uint32_t max_rx_pkt_len;
+   uint16_t rq_buf_len;
 
if (!rq->in_use)
return 0;
@@ -273,6 +275,18 @@ enic_alloc_rx_queue_mbufs(struct enic *enic, struct 
vnic_rq *rq)
dev_debug(enic, "queue %u, allocating %u rx queue mbufs\n", rq->index,
  rq->ring.desc_count);
 
+   /*
+* If *not* using scatter and the mbuf size is smaller than the
+* requested max packet size (max_rx_pkt_len), then reduce the
+* posted buffer size to max_rx_pkt_len. HW still receives packets
+* larger than max_rx_pkt_len, but they will be truncated, which we
+* drop in the rx handler. Not ideal, but better than returning
+* large packets when the user is not expecting them.
+*/
+   max_rx_pkt_len = enic->rte_dev->data->dev_conf.rxmode.max_rx_pkt_len;
+   rq_buf_len = rte_pktmbuf_data_room_size(rq->mp) - RTE_PKTMBUF_HEADROOM;
+   if (max_rx_pkt_len < rq_buf_len && !rq->data_queue_enable)
+   rq_buf_len = max_rx_pkt_len;
for (i = 0; i < rq->ring.desc_count; i++, rqd++) {
mb = rte_mbuf_raw_alloc(rq->mp);
if (mb == NULL) {
@@ -287,7 +301,7 @@ enic_alloc_rx_queue_mbufs(stru

[dpdk-dev] [PATCH v2 06/10] net/enic: allocate stats DMA buffer upfront during probe

2018-03-05 Thread John Daley
From: Hyong Youb Kim 

The driver provides a DMA buffer to the firmware when it requests port
stats. The NIC then fills that buffer with latest stats. Currently,
the driver allocates the DMA buffer the first time it requests stats
and saves it for later use. This can lead to crashes when
primary/secondary processes are involved. For example, the following
sequence crashes the secondary process.

1. Start a primary app that does not call rte_eth_stats_get()
2. dpdk-procinfo -- --stats

dpdk-procinfo crashes while trying to allocate the stats DMA buffer
because the alloc function pointer (vdev.alloc_consistent) is valid
only in the primary process, not in the secondary process.

Overwriting the alloc function pointer in the secondary process is not
an option, as it will simply make the pointer invalid in the primary
process. Instead, allocate the DMA buffer during probe so that only
the primary process does both allocate and free. This allows the
secondary process to dump stats as well.

Cc: sta...@dpdk.org
Fixes: 9913fbb91df0 ("enic/base: common code")

Signed-off-by: Hyong Youb Kim 
Reviewed-by: John Daley 
---
 drivers/net/enic/base/vnic_dev.c | 24 ++--
 drivers/net/enic/base/vnic_dev.h |  1 +
 drivers/net/enic/enic_main.c |  9 +
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/net/enic/base/vnic_dev.c b/drivers/net/enic/base/vnic_dev.c
index 05b595eb8..1f8d222fc 100644
--- a/drivers/net/enic/base/vnic_dev.c
+++ b/drivers/net/enic/base/vnic_dev.c
@@ -587,17 +587,9 @@ int vnic_dev_stats_dump(struct vnic_dev *vdev, struct 
vnic_stats **stats)
 {
u64 a0, a1;
int wait = 1000;
-   static u32 instance;
-   char name[NAME_MAX];
 
-   if (!vdev->stats) {
-   snprintf((char *)name, sizeof(name),
-   "vnic_stats-%u", instance++);
-   vdev->stats = vdev->alloc_consistent(vdev->priv,
-   sizeof(struct vnic_stats), &vdev->stats_pa, (u8 *)name);
-   if (!vdev->stats)
-   return -ENOMEM;
-   }
+   if (!vdev->stats)
+   return -ENOMEM;
 
*stats = vdev->stats;
a0 = vdev->stats_pa;
@@ -922,6 +914,18 @@ u32 vnic_dev_get_intr_coal_timer_max(struct vnic_dev *vdev)
return vdev->intr_coal_timer_info.max_usec;
 }
 
+int vnic_dev_alloc_stats_mem(struct vnic_dev *vdev)
+{
+   char name[NAME_MAX];
+   static u32 instance;
+
+   snprintf((char *)name, sizeof(name), "vnic_stats-%u", instance++);
+   vdev->stats = vdev->alloc_consistent(vdev->priv,
+sizeof(struct vnic_stats),
+&vdev->stats_pa, (u8 *)name);
+   return vdev->stats == NULL ? -ENOMEM : 0;
+}
+
 void vnic_dev_unregister(struct vnic_dev *vdev)
 {
if (vdev) {
diff --git a/drivers/net/enic/base/vnic_dev.h b/drivers/net/enic/base/vnic_dev.h
index 8c0992063..7e5736b4d 100644
--- a/drivers/net/enic/base/vnic_dev.h
+++ b/drivers/net/enic/base/vnic_dev.h
@@ -165,6 +165,7 @@ struct vnic_dev *vnic_dev_register(struct vnic_dev *vdev,
void *priv, struct rte_pci_device *pdev, struct vnic_dev_bar *bar,
unsigned int num_bars);
 struct rte_pci_device *vnic_dev_get_pdev(struct vnic_dev *vdev);
+int vnic_dev_alloc_stats_mem(struct vnic_dev *vdev);
 int vnic_dev_cmd_init(struct vnic_dev *vdev, int fallback);
 int vnic_dev_get_size(void);
 int vnic_dev_int13(struct vnic_dev *vdev, u64 arg, u32 op);
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index d4f478b5e..bd4447f01 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -1478,6 +1478,15 @@ int enic_probe(struct enic *enic)
enic_alloc_consistent,
enic_free_consistent);
 
+   /*
+* Allocate the consistent memory for stats upfront so both primary and
+* secondary processes can dump stats.
+*/
+   err = vnic_dev_alloc_stats_mem(enic->vdev);
+   if (err) {
+   dev_err(enic, "Failed to allocate cmd memory, aborting\n");
+   goto err_out_unregister;
+   }
/* Issue device open to get device in known state */
err = enic_dev_open(enic);
if (err) {
-- 
2.16.2



[dpdk-dev] [PATCH v2 08/10] doc: describe Rx bytes counter behavior for enic

2018-03-05 Thread John Daley
From: Hyong Youb Kim 

Signed-off-by: Hyong Youb Kim 
Reviewed-by: John Daley 
---
 doc/guides/nics/enic.rst | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/guides/nics/enic.rst b/doc/guides/nics/enic.rst
index 7e19cf88a..0bc55936a 100644
--- a/doc/guides/nics/enic.rst
+++ b/doc/guides/nics/enic.rst
@@ -310,6 +310,14 @@ Limitations
 were added. Since there currently is no grouping or priority support,
 'catch-all' filters should be added last.
 
+- **Statistics**
+
+  - ``rx_good_bytes`` (ibytes) always includes VLAN header (4B) and CRC bytes 
(4B).
+  - When the NIC drops a packet because the Rx queue has no free buffers,
+``rx_good_bytes`` still increments by 4B if the packet is not VLAN tagged 
or
+VLAN stripping is disabled, or by 8B if the packet is VLAN tagged and 
stripping
+is enabled.
+
 How to build the suite
 --
 
-- 
2.16.2



[dpdk-dev] [PATCH v2 02/10] net/enic: allow the user to change RSS settings

2018-03-05 Thread John Daley
From: Hyong Youb Kim 

Currently, when more than 1 receive queues are configured, the driver
always enables RSS with the driver's own default hash type, key, and
RETA. The user is unable to change any of the RSS settings. Address
this by implementing the ethdev RSS API as follows.

Correctly report the RETA size, key size, and supported hash types
through rte_eth_dev_info.

During dev_configure(), initialize RSS according to the device's
mq_mode and rss_conf. Start with the default RETA, and use the default
key unless a custom key is provided.

Add the RETA and rss_conf query/set handlers to let the user change
RSS settings after the initial configuration. The hardware is able to
change hash type, key, and RETA individually. So, the handlers change
only the affected settings.

Refactor/rename several functions in order to make their intentions
clear. For example, remove all traces of RSS from
enicpmd_vlan_offload_set() as it is confusing.

Signed-off-by: Hyong Youb Kim 
Reviewed-by: John Daley 
---
 doc/guides/nics/features/enic.ini |   2 +
 drivers/net/enic/enic.h   |  20 +++-
 drivers/net/enic/enic_ethdev.c| 117 ++-
 drivers/net/enic/enic_main.c  | 192 --
 drivers/net/enic/enic_res.c   |  20 
 drivers/net/enic/enic_res.h   |   6 ++
 6 files changed, 301 insertions(+), 56 deletions(-)

diff --git a/doc/guides/nics/features/enic.ini 
b/doc/guides/nics/features/enic.ini
index 498341f07..e79d7277d 100644
--- a/doc/guides/nics/features/enic.ini
+++ b/doc/guides/nics/features/enic.ini
@@ -15,6 +15,8 @@ Promiscuous mode = Y
 Unicast MAC filter   = Y
 Multicast MAC filter = Y
 RSS hash = Y
+RSS key update   = Y
+RSS reta update  = Y
 SR-IOV   = Y
 VLAN filter  = Y
 CRC offload  = Y
diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index e88af6bc9..d29939c94 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -146,6 +146,20 @@ struct enic {
 
LIST_HEAD(enic_flows, rte_flow) flows;
rte_spinlock_t flows_lock;
+
+   /* RSS */
+   uint16_t reta_size;
+   uint8_t hash_key_size;
+   uint64_t flow_type_rss_offloads; /* 0 indicates RSS not supported */
+   /*
+* Keep a copy of current RSS config for queries, as we cannot retrieve
+* it from the NIC.
+*/
+   uint8_t rss_hash_type; /* NIC_CFG_RSS_HASH_TYPE flags */
+   uint8_t rss_enable;
+   uint64_t rss_hf; /* ETH_RSS flags */
+   union vnic_rss_key rss_key;
+   union vnic_rss_cpu rss_cpu;
 };
 
 /* Get the CQ index from a Start of Packet(SOP) RQ index */
@@ -239,8 +253,12 @@ void enic_free_rq(void *rxq);
 int enic_alloc_rq(struct enic *enic, uint16_t queue_idx,
  unsigned int socket_id, struct rte_mempool *mp,
  uint16_t nb_desc, uint16_t free_thresh);
-int enic_set_rss_nic_cfg(struct enic *enic);
 int enic_set_vnic_res(struct enic *enic);
+int enic_init_rss_nic_cfg(struct enic *enic);
+int enic_set_rss_conf(struct enic *enic,
+ struct rte_eth_rss_conf *rss_conf);
+int enic_set_rss_reta(struct enic *enic, union vnic_rss_cpu *rss_cpu);
+int enic_set_vlan_strip(struct enic *enic);
 int enic_enable(struct enic *enic);
 int enic_disable(struct enic *enic);
 void enic_remove(struct enic *enic);
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index d84714efb..cbab7029b 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -345,8 +345,6 @@ static int enicpmd_vlan_offload_set(struct rte_eth_dev 
*eth_dev, int mask)
else
enic->ig_vlan_strip_en = 0;
}
-   enic_set_rss_nic_cfg(enic);
-
 
if (mask & ETH_VLAN_FILTER_MASK) {
dev_warning(enic,
@@ -358,7 +356,7 @@ static int enicpmd_vlan_offload_set(struct rte_eth_dev 
*eth_dev, int mask)
"Configuration of extended VLAN is not supported\n");
}
 
-   return 0;
+   return enic_set_vlan_strip(enic);
 }
 
 static int enicpmd_dev_configure(struct rte_eth_dev *eth_dev)
@@ -379,8 +377,16 @@ static int enicpmd_dev_configure(struct rte_eth_dev 
*eth_dev)
enic->hw_ip_checksum = !!(eth_dev->data->dev_conf.rxmode.offloads &
  DEV_RX_OFFLOAD_CHECKSUM);
ret = enicpmd_vlan_offload_set(eth_dev, ETH_VLAN_STRIP_MASK);
-
-   return ret;
+   if (ret) {
+   dev_err(enic, "Failed to configure VLAN offloads\n");
+   return ret;
+   }
+   /*
+* Initialize RSS with the default reta and key. If the user key is
+* given (rx_adv_conf.rss_conf.rss_key), will use that instead of the
+* default key.
+*/
+   return enic_init_rss_nic_cfg(enic);
 }
 
 /* Start the device.
@@ -480,6 +486,9 @@ static void enicpmd_dev_info_get(struct rte_eth_dev 
*eth_dev,
device_info->default_

[dpdk-dev] [PATCH v2 10/10] net/enic: support for meson

2018-03-05 Thread John Daley
From: Hyong Youb Kim 

Signed-off-by: Hyong Youb Kim 
Reviewed-by: John Daley 
---

 drivers/net/enic/meson.build | 19 +++
 drivers/net/meson.build  |  2 +-
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/enic/meson.build

diff --git a/drivers/net/enic/meson.build b/drivers/net/enic/meson.build
new file mode 100644
index 0..bfd4e2373
--- /dev/null
+++ b/drivers/net/enic/meson.build
@@ -0,0 +1,19 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Cisco Systems, Inc.
+
+sources = files(
+   'base/vnic_cq.c',
+   'base/vnic_dev.c',
+   'base/vnic_intr.c',
+   'base/vnic_rq.c',
+   'base/vnic_rss.c',
+   'base/vnic_wq.c',
+   'enic_clsf.c',
+   'enic_ethdev.c',
+   'enic_flow.c',
+   'enic_main.c',
+   'enic_res.c',
+   'enic_rxtx.c',
+   )
+deps += ['hash']
+includes += include_directories('base')
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 704cbe3c8..f535baa13 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -2,7 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 drivers = ['af_packet', 'bonding',
-   'e1000', 'fm10k', 'i40e', 'ixgbe',
+   'e1000', 'enic', 'fm10k', 'i40e', 'ixgbe',
'null', 'octeontx', 'pcap', 'ring',
'sfc', 'thunderx']
 std_deps = ['ethdev', 'kvargs'] # 'ethdev' also pulls in mbuf, net, eal etc
-- 
2.16.2



[dpdk-dev] [PATCH v2 07/10] net/enic: support Rx queue interrupts

2018-03-05 Thread John Daley
From: Hyong Youb Kim 

Enable rx queue interrupts if the app requests them, and vNIC has
enough interrupt resources. Use interrupt vector 0 for link status and
errors. Use vector 1 for rx queue 0, vector 2 for rx queue 1, and so
on. So, with n rx queues, vNIC needs to have at n + 1 interrupts.

For VIC, enabling and disabling rx queue interrupts are simply
mask/unmask operations. VIC's credit based interrupt moderation is not
used, as the app wants to explicitly control when to enable/disable
interrupts.

This version requires MSI-X (vfio-pci). Sharing one interrupt for link
status and rx queues is possible, but is rather complex and has no
user demands.

Signed-off-by: Hyong Youb Kim 
Reviewed-by: John Daley 
---
 doc/guides/nics/enic.rst  |   7 ++-
 doc/guides/nics/features/enic.ini |   1 +
 drivers/net/enic/enic.h   |  15 -
 drivers/net/enic/enic_ethdev.c|  22 +++
 drivers/net/enic/enic_main.c  | 123 --
 drivers/net/enic/enic_res.c   |   3 +-
 6 files changed, 149 insertions(+), 22 deletions(-)

diff --git a/doc/guides/nics/enic.rst b/doc/guides/nics/enic.rst
index 0e655e9e3..7e19cf88a 100644
--- a/doc/guides/nics/enic.rst
+++ b/doc/guides/nics/enic.rst
@@ -114,11 +114,16 @@ Configuration information
 
   - **Interrupts**
 
-Only one interrupt per vNIC interface should be configured in the UCS
+At least one interrupt per vNIC interface should be configured in the UCS
 manager regardless of the number receive/transmit queues. The ENIC PMD
 uses this interrupt to get information about link status and errors
 in the fast path.
 
+In addition to the interrupt for link status and errors, when using Rx 
queue
+interrupts, increase the number of configured interrupts so that there is 
at
+least one interrupt for each Rx queue. For example, if the app uses 3 Rx
+queues and wants to use per-queue interrupts, configure 4 (3 + 1) 
interrupts.
+
 .. _enic-flow-director:
 
 Flow director support
diff --git a/doc/guides/nics/features/enic.ini 
b/doc/guides/nics/features/enic.ini
index e79d7277d..ea171a45b 100644
--- a/doc/guides/nics/features/enic.ini
+++ b/doc/guides/nics/features/enic.ini
@@ -6,6 +6,7 @@
 [Features]
 Link status  = Y
 Link status event= Y
+Rx interrupt = Y
 Queue start/stop = Y
 MTU update   = Y
 Jumbo frame  = Y
diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 1b3813a58..adccc8ac5 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -49,6 +49,15 @@
 
 #define ENICPMD_FDIR_MAX   64
 
+/*
+ * Interrupt 0: LSC and errors
+ * Interrupt 1: rx queue 0
+ * Interrupt 2: rx queue 1
+ * ...
+ */
+#define ENICPMD_LSC_INTR_OFFSET 0
+#define ENICPMD_RXQ_INTR_OFFSET 1
+
 struct enic_fdir_node {
struct rte_eth_fdir_filter filter;
u16 fltr_id;
@@ -126,9 +135,9 @@ struct enic {
struct vnic_cq *cq;
unsigned int cq_count; /* equals rq_count + wq_count */
 
-   /* interrupt resource */
-   struct vnic_intr intr;
-   unsigned int intr_count;
+   /* interrupt vectors (len = conf_intr_count) */
+   struct vnic_intr *intr;
+   unsigned int intr_count; /* equals enabled interrupts (lsc + rxqs) */
 
/* software counters */
struct enic_soft_stats soft_stats;
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index 6dd72729e..2a289b6a4 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -738,6 +738,26 @@ static void enicpmd_dev_txq_info_get(struct rte_eth_dev 
*dev,
/* tx_thresh, and all the other fields are not applicable for enic */
 }
 
+static int enicpmd_dev_rx_queue_intr_enable(struct rte_eth_dev *eth_dev,
+   uint16_t rx_queue_id)
+{
+   struct enic *enic = pmd_priv(eth_dev);
+
+   ENICPMD_FUNC_TRACE();
+   vnic_intr_unmask(&enic->intr[rx_queue_id + ENICPMD_RXQ_INTR_OFFSET]);
+   return 0;
+}
+
+static int enicpmd_dev_rx_queue_intr_disable(struct rte_eth_dev *eth_dev,
+uint16_t rx_queue_id)
+{
+   struct enic *enic = pmd_priv(eth_dev);
+
+   ENICPMD_FUNC_TRACE();
+   vnic_intr_mask(&enic->intr[rx_queue_id + ENICPMD_RXQ_INTR_OFFSET]);
+   return 0;
+}
+
 static const struct eth_dev_ops enicpmd_eth_dev_ops = {
.dev_configure= enicpmd_dev_configure,
.dev_start= enicpmd_dev_start,
@@ -770,6 +790,8 @@ static const struct eth_dev_ops enicpmd_eth_dev_ops = {
.rx_descriptor_done   = NULL,
.tx_queue_setup   = enicpmd_dev_tx_queue_setup,
.tx_queue_release = enicpmd_dev_tx_queue_release,
+   .rx_queue_intr_enable = enicpmd_dev_rx_queue_intr_enable,
+   .rx_queue_intr_disable = enicpmd_dev_rx_queue_intr_disable,
.rxq_info_get = enicpmd_dev_rxq_info_get,
.txq_info_get = enicpmd_dev_txq_

[dpdk-dev] [PATCH v2 09/10] net/enic: use memcpy to avoid strict aliasing warnings

2018-03-05 Thread John Daley
From: Hyong Youb Kim 

Signed-off-by: Hyong Youb Kim 
Reviewed-by: John Daley 
---
 drivers/net/enic/enic_clsf.c | 21 -
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/net/enic/enic_clsf.c b/drivers/net/enic/enic_clsf.c
index 3ef1d0832..9d95201ec 100644
--- a/drivers/net/enic/enic_clsf.c
+++ b/drivers/net/enic/enic_clsf.c
@@ -111,7 +111,6 @@ copy_fltr_v2(struct filter_v2 *fltr, struct 
rte_eth_fdir_input *input,
 struct rte_eth_fdir_masks *masks)
 {
struct filter_generic_1 *gp = &fltr->u.generic_1;
-   int i;
 
fltr->type = FILTER_DPDK_1;
memset(gp, 0, sizeof(*gp));
@@ -273,18 +272,14 @@ copy_fltr_v2(struct filter_v2 *fltr, struct 
rte_eth_fdir_input *input,
ipv6_mask.proto = masks->ipv6_mask.proto;
ipv6_val.proto = input->flow.ipv6_flow.proto;
}
-   for (i = 0; i < 4; i++) {
-   *(uint32_t *)&ipv6_mask.src_addr[i * 4] =
-   masks->ipv6_mask.src_ip[i];
-   *(uint32_t *)&ipv6_val.src_addr[i * 4] =
-   input->flow.ipv6_flow.src_ip[i];
-   }
-   for (i = 0; i < 4; i++) {
-   *(uint32_t *)&ipv6_mask.dst_addr[i * 4] =
-   masks->ipv6_mask.src_ip[i];
-   *(uint32_t *)&ipv6_val.dst_addr[i * 4] =
-   input->flow.ipv6_flow.dst_ip[i];
-   }
+   memcpy(ipv6_mask.src_addr, masks->ipv6_mask.src_ip,
+  sizeof(ipv6_mask.src_addr));
+   memcpy(ipv6_val.src_addr, input->flow.ipv6_flow.src_ip,
+  sizeof(ipv6_val.src_addr));
+   memcpy(ipv6_mask.dst_addr, masks->ipv6_mask.dst_ip,
+  sizeof(ipv6_mask.dst_addr));
+   memcpy(ipv6_val.dst_addr, input->flow.ipv6_flow.dst_ip,
+  sizeof(ipv6_val.dst_addr));
if (input->flow.ipv6_flow.tc) {
ipv6_mask.vtc_flow = masks->ipv6_mask.tc << 12;
ipv6_val.vtc_flow = input->flow.ipv6_flow.tc << 12;
-- 
2.16.2



[dpdk-dev] [PATCH v2] net/null:Different mac address support

2018-03-05 Thread Mallesh Koujalagi
After attaching two Null device to ovs, seeing "00.00.00.00.00.00" mac
address for both null devices. Fix this issue, by setting different mac
address.

Signed-off-by: Mallesh Koujalagi 
---
 drivers/net/null/rte_eth_null.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
index 9385ffd..599b513 100644
--- a/drivers/net/null/rte_eth_null.c
+++ b/drivers/net/null/rte_eth_null.c
@@ -85,8 +85,17 @@ struct pmd_internals {
uint8_t rss_key[40];/**< 40-byte hash key. */
 };
 
+static struct ether_addr base_eth_addr = {
+   .addr_bytes = {
+   0x4E /* N */,
+   0x55 /* U */,
+   0x4C /* L */,
+   0x4C /* L */,
+   0x00,
+   0x00
+   }
+};
 
-static struct ether_addr eth_addr = { .addr_bytes = {0} };
 static struct rte_eth_link pmd_link = {
.link_speed = ETH_SPEED_NUM_10G,
.link_duplex = ETH_LINK_FULL_DUPLEX,
@@ -492,6 +501,7 @@ eth_dev_null_create(struct rte_vdev_device *dev,
struct rte_eth_dev_data *data = NULL;
struct pmd_internals *internals = NULL;
struct rte_eth_dev *eth_dev = NULL;
+   struct ether_addr *eth_addr = NULL;
 
static const uint8_t default_rss_key[40] = {
0x6D, 0x5A, 0x56, 0xDA, 0x25, 0x5B, 0x0E, 0xC2, 0x41, 0x67, 
0x25, 0x3D,
@@ -514,12 +524,21 @@ eth_dev_null_create(struct rte_vdev_device *dev,
if (!data)
return -ENOMEM;
 
+   eth_addr = rte_zmalloc_socket(rte_vdev_device_name(dev),
+   sizeof(*eth_addr), 0, dev->device.numa_node);
+   if (eth_addr == NULL) {
+   rte_free(data);
+   return -ENOMEM;
+   }
+
eth_dev = rte_eth_vdev_allocate(dev, sizeof(*internals));
if (!eth_dev) {
+   rte_free(eth_addr);
rte_free(data);
return -ENOMEM;
}
-
+   *eth_addr = base_eth_addr;
+   eth_addr->addr_bytes[5] = eth_dev->data->port_id;
/* now put it all together
 * - store queue data in internals,
 * - store numa_node info in ethdev data
@@ -543,7 +562,7 @@ eth_dev_null_create(struct rte_vdev_device *dev,
data->nb_rx_queues = (uint16_t)nb_rx_queues;
data->nb_tx_queues = (uint16_t)nb_tx_queues;
data->dev_link = pmd_link;
-   data->mac_addrs = ð_addr;
+   data->mac_addrs = eth_addr;
 
eth_dev->data = data;
eth_dev->dev_ops = &ops;
@@ -662,6 +681,7 @@ rte_pmd_null_remove(struct rte_vdev_device *dev)
if (eth_dev == NULL)
return -1;
 
+   rte_free(eth_dev->data->mac_addrs);
rte_free(eth_dev->data->dev_private);
rte_free(eth_dev->data);
 
-- 
2.7.4



Re: [dpdk-dev] [PATCH v2] net/null:Different mac address support

2018-03-05 Thread Stephen Hemminger
On Mon,  5 Mar 2018 19:35:14 -0800
Mallesh Koujalagi  wrote:

> After attaching two Null device to ovs, seeing "00.00.00.00.00.00" mac
> address for both null devices. Fix this issue, by setting different mac
> address.
> 
> Signed-off-by: Mallesh Koujalagi 
> ---
>  drivers/net/null/rte_eth_null.c | 26 +++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
> index 9385ffd..599b513 100644
> --- a/drivers/net/null/rte_eth_null.c
> +++ b/drivers/net/null/rte_eth_null.c
> @@ -85,8 +85,17 @@ struct pmd_internals {
>   uint8_t rss_key[40];/**< 40-byte hash key. */
>  };
>  
> +static struct ether_addr base_eth_addr = {
> + .addr_bytes = {
> + 0x4E /* N */,
> + 0x55 /* U */,
> + 0x4C /* L */,
> + 0x4C /* L */,
> + 0x00,
> + 0x00
> + }
> +};

Cute, but since first octets of Ethernet are the vendor id (OUI)
it might be confusing. At least 'N' is 4E which does not have
the group address (multicast) set; and it does have the local
admin address bit set.

You really should be using a random locally assigned value.

 
> -static struct ether_addr eth_addr = { .addr_bytes = {0} };
>  static struct rte_eth_link pmd_link = {
>   .link_speed = ETH_SPEED_NUM_10G,
>   .link_duplex = ETH_LINK_FULL_DUPLEX,
> @@ -492,6 +501,7 @@ eth_dev_null_create(struct rte_vdev_device *dev,
>   struct rte_eth_dev_data *data = NULL;
>   struct pmd_internals *internals = NULL;
>   struct rte_eth_dev *eth_dev = NULL;
> + struct ether_addr *eth_addr = NULL;
>  
>   static const uint8_t default_rss_key[40] = {
>   0x6D, 0x5A, 0x56, 0xDA, 0x25, 0x5B, 0x0E, 0xC2, 0x41, 0x67, 
> 0x25, 0x3D,
> @@ -514,12 +524,21 @@ eth_dev_null_create(struct rte_vdev_device *dev,
>   if (!data)
>   return -ENOMEM;
>  
> + eth_addr = rte_zmalloc_socket(rte_vdev_device_name(dev),
> + sizeof(*eth_addr), 0, dev->device.numa_node);
> + if (eth_addr == NULL) {
> + rte_free(data);
> + return -ENOMEM;
> + }
> +
>   eth_dev = rte_eth_vdev_allocate(dev, sizeof(*internals));
>   if (!eth_dev) {
> + rte_free(eth_addr);
>   rte_free(data);
>   return -ENOMEM;
>   }
> -
> + *eth_addr = base_eth_addr;
> + eth_addr->addr_bytes[5] = eth_dev->data->port_id;
>   /* now put it all together
>* - store queue data in internals,
>* - store numa_node info in ethdev data
> @@ -543,7 +562,7 @@ eth_dev_null_create(struct rte_vdev_device *dev,
>   data->nb_rx_queues = (uint16_t)nb_rx_queues;
>   data->nb_tx_queues = (uint16_t)nb_tx_queues;
>   data->dev_link = pmd_link;
> - data->mac_addrs = ð_addr;
> + data->mac_addrs = eth_addr;
>  
>   eth_dev->data = data;
>   eth_dev->dev_ops = &ops;
> @@ -662,6 +681,7 @@ rte_pmd_null_remove(struct rte_vdev_device *dev)
>   if (eth_dev == NULL)
>   return -1;
>  
> + rte_free(eth_dev->data->mac_addrs);
>   rte_free(eth_dev->data->dev_private);
>   rte_free(eth_dev->data);
>  



Re: [dpdk-dev] 16.11.5 (LTS) patches review and test

2018-03-05 Thread gowrishankar muthukrishnan

On Monday 05 March 2018 03:42 PM, Luca Boccassi wrote:

On Mon, 2018-03-05 at 11:31 +0530, gowrishankar muthukrishnan wrote:

Hi Luca,
In powerpc to support i40e, we wish below patch be merged:

c3def6a8724 net/i40e: implement vector PMD for altivec

I have verified br-16.11 with the above commit (in cherry-pick, I
needed
to remove release
notes which was meant for 17.05 release which hope is fine here).
Could you please merge the above.

Thanks,
Gowrishankar

Hi,

This introduced a new PMD for that architecture, right?

If so I can merge the patch, at the following conditions:

1) It will be disabled by default
2) Support and help in backporting will have to be provided by the
authors for the remaining lifetime of 16.11

Is this OK for you?


Yes, please go ahead.

Thanks,
Gowrishankar


On Monday 26 February 2018 05:04 PM, Luca Boccassi wrote:

Hi all,

Here is a list of patches targeted for LTS release 16.11.5. Please
help review and test. The planned date for the final release is
March
the 5th, pending results from regression tests.
Before that, please shout if anyone has objections with these
patches being applied.

These patches are located at branch 16.11 of dpdk-stable repo:
  http://dpdk.org/browse/dpdk-stable/

Thanks.

Luca Boccassi

---
Ajit Khaparde (6):
    net/bnxt: support new PCI IDs
    net/bnxt: parse checksum offload flags
    net/bnxt: fix group info usage
    net/bnxt: fix broadcast cofiguration
    net/bnxt: fix size of Tx ring in HW
    net/bnxt: fix link speed setting with autoneg off

Akhil Goyal (1):
    examples/ipsec-secgw: fix corner case for SPI value

Alejandro Lucero (3):
    net/nfp: fix MTU settings
    net/nfp: fix jumbo settings
    net/nfp: fix CRC strip check behaviour

Anatoly Burakov (14):
    memzone: fix leak on allocation error
    malloc: protect stats with lock
    malloc: fix end for bounded elements
    vfio: fix enabled check on error
    app/procinfo: add compilation option in config
    test: register test as failed if setup failed
    test/table: fix uninitialized parameter
    test/memzone: fix wrong test
    test/memzone: handle previously allocated memzones
    usertools/devbind: remove unused function
    test/reorder: fix memory leak
    test/ring_perf: fix memory leak
    test/table: fix memory leak
    test/timer_perf: fix memory leak

Andriy Berestovskyy (1):
    keepalive: fix state alignment

Bao-Long Tran (1):
    examples/ip_pipeline: fix timer period unit

Beilei Xing (8):
    net/i40e: fix flow director Rx resource defect
    net/i40e: add warnings when writing global registers
    net/i40e: add debug logs when writing global registers
    net/i40e: fix multiple driver support issue
    net/i40e: fix interrupt conflict when using multi-driver
    net/i40e: fix Rx interrupt
    net/i40e: check multi-driver option parsing
    app/testpmd: fix flow director filter

Chas Williams (1):
    net/bonding: fix setting slave MAC addresses

David Harton (1):
    net/i40e: fix VF reset stats crash

Didier Pallard (1):
    net/virtio: fix incorrect cast

Dustin Lundquist (1):
    examples/exception_path: align stats on cache line

Erez Ferber (1):
    net/mlx5: fix MTU update

Ferruh Yigit (1):
    kni: fix build with kernel 4.15

Fiona Trahe (1):
    crypto/qat: fix null auth algo overwrite

Gowrishankar Muthukrishnan (2):
    eal/ppc: remove the braces in memory barrier macros
    eal/ppc: support sPAPR IOMMU for vfio-pci

Harish Patil (2):
    net/qede: fix to reject config with no Rx queue
    net/qede/base: fix VF LRO tunnel configuration

Hemant Agrawal (4):
    pmdinfogen: fix cross compilation for ARM big endian
    lpm: fix ARM big endian build
    net/i40e: fix ARM big endian build
    net/ixgbe: fix ARM big endian build

Hyong Youb Kim (1):
    net/enic: fix crash due to static max number of queues

Igor Ryzhov (1):
    net/i40e: fix flag for MAC address write

Ilya V. Matveychikov (2):
    eal: update assertion macro
    mbuf: cleanup function to get last segment

Jerin Jacob (3):
    net/thunderx: fix multi segment Tx function return
    test/crypto: fix missing include
    ethdev: fix data alignment

Jerry Lilijun (1):
    net/bonding: fix activated slave in 8023ad mode

Jianfeng Tan (3):
    vhost: fix crash
    net/vhost: fix log messages on create/destroy
    net/virtio-user: fix start with kernel vhost

Junjie Chen (3):
    vhost: fix dequeue zero copy with virtio1
    examples/vhost: fix sending ARP packet to self
    vhost: fix mbuf free

Kefu Chai (1):
    contigmem: fix build on FreeBSD 12

Konstantin Ananyev (1):
    eal/x86: use lock-prefixed instructions for SMP barrier

Liang-Min Larry Wang (1):
    net/ixgbe: improve link state check on VF

Marko Kovacevic (2):
  

Re: [dpdk-dev] [PATCH 3/4] drivers/net: do not allocate rte_eth_dev_data privately

2018-03-05 Thread Matan Azrad
Hi Jianfeng

Please see a comment below.

> From: Jianfeng Tan, Sent: Sunday, March 4, 2018 5:30 PM
> We introduced private rte_eth_dev_data to allow vdev to be created both in
> primary process and secondary process(es). This is not friendly to multi-
> process model, for example, it leads to port id contention issue if two
> processes both find the data entry is free.
> 
> And to get stats of primary vdev in secondary, we must allocate from the
> pre-defined array so that we can find it.
> 
> Suggested-by: Bruce Richardson 
> Signed-off-by: Jianfeng Tan 
> ---
>  drivers/net/af_packet/rte_eth_af_packet.c | 25 +++--
>  drivers/net/kni/rte_eth_kni.c | 13 ++---
>  drivers/net/null/rte_eth_null.c   | 17 +++--
>  drivers/net/octeontx/octeontx_ethdev.c| 14 ++
>  drivers/net/pcap/rte_eth_pcap.c   | 18 +++---
>  drivers/net/tap/rte_eth_tap.c |  9 +
>  drivers/net/vhost/rte_eth_vhost.c | 17 ++---
>  7 files changed, 20 insertions(+), 93 deletions(-)
> 
> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
> b/drivers/net/af_packet/rte_eth_af_packet.c
> index 57eccfd..2db692f 100644
> --- a/drivers/net/af_packet/rte_eth_af_packet.c
> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> @@ -564,25 +564,17 @@ rte_pmd_init_internals(struct rte_vdev_device
> *dev,
>   RTE_LOG(ERR, PMD,
>   "%s: no interface specified for AF_PACKET
> ethdev\n",
>   name);
> - goto error_early;
> + return -1;
>   }
> 
>   RTE_LOG(INFO, PMD,
>   "%s: creating AF_PACKET-backed ethdev on numa socket
> %u\n",
>   name, numa_node);
> 
> - /*
> -  * now do all data allocation - for eth_dev structure, dummy pci
> driver
> -  * and internal (private) data
> -  */
> - data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
> - if (data == NULL)
> - goto error_early;
> -
>   *internals = rte_zmalloc_socket(name, sizeof(**internals),
>   0, numa_node);
>   if (*internals == NULL)
> - goto error_early;
> + return -1;
> 
>   for (q = 0; q < nb_queues; q++) {
>   (*internals)->rx_queue[q].map = MAP_FAILED; @@ -604,24
> +596,24 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>   RTE_LOG(ERR, PMD,
>   "%s: I/F name too long (%s)\n",
>   name, pair->value);
> - goto error_early;
> + return -1;
>   }
>   if (ioctl(sockfd, SIOCGIFINDEX, &ifr) == -1) {
>   RTE_LOG(ERR, PMD,
>   "%s: ioctl failed (SIOCGIFINDEX)\n",
>   name);
> - goto error_early;
> + return -1;
>   }
>   (*internals)->if_name = strdup(pair->value);
>   if ((*internals)->if_name == NULL)
> - goto error_early;
> + return -1;
>   (*internals)->if_index = ifr.ifr_ifindex;
> 
>   if (ioctl(sockfd, SIOCGIFHWADDR, &ifr) == -1) {
>   RTE_LOG(ERR, PMD,
>   "%s: ioctl failed (SIOCGIFHWADDR)\n",
>   name);
> - goto error_early;
> + return -1;
>   }
>   memcpy(&(*internals)->eth_addr, ifr.ifr_hwaddr.sa_data,
> ETH_ALEN);
> 
> @@ -775,14 +767,13 @@ rte_pmd_init_internals(struct rte_vdev_device
> *dev,
> 
>   (*internals)->nb_queues = nb_queues;
> 
> - rte_memcpy(data, (*eth_dev)->data, sizeof(*data));
> + data = (*eth_dev)->data;
>   data->dev_private = *internals;
>   data->nb_rx_queues = (uint16_t)nb_queues;
>   data->nb_tx_queues = (uint16_t)nb_queues;
>   data->dev_link = pmd_link;
>   data->mac_addrs = &(*internals)->eth_addr;
> 
> - (*eth_dev)->data = data;
>   (*eth_dev)->dev_ops = &ops;
> 
>   return 0;
> @@ -802,8 +793,6 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
>   }
>   free((*internals)->if_name);
>   rte_free(*internals);
> -error_early:
> - rte_free(data);
>   return -1;
>  }
> 

I think you should remove the private rte_eth_dev_data freeing in  
rte_pmd_af_packet_remove().
This is relevant to all the vdevs here.

Question:
Does the patch include all the vdevs which allocated private rte_eth_dev_data?
If so, it may solve also part of the issue discussed here:
https://dpdk.org/dev/patchwork/patch/34047/


Matan.

> diff --git a/drivers/net/kni/rte_eth_kni.c b/drivers/net/kni/rte_eth_kni.c
> index dc4e65f..1a07089 100644
> --- a/drivers/net/kni/rte_eth_kni.c
> +++ b/drivers/net/kni/rte_eth_kni.c
> @@ -337,25 +337,17 @@ eth_kni_create(struct rte_vdev_device *vdev,
>   struct pmd_internals *internals;
>   struct rte_eth_dev_data *data;
>   struct rte_eth_dev *eth_dev;
> - const char *name;
> 
>   RTE_LOG(INFO, PMD, "Creating kni ethdev on numa s

Re: [dpdk-dev] [RFC 1/4] drivers/bus/ifpga:Intel FPGA Bus Lib Code

2018-03-05 Thread Shreyansh Jain
Hello Rosen,

I have some initial (and most of them trivial) comments inline...

On Tue, Mar 6, 2018 at 7:13 AM, Rosen Xu  wrote:
> Signed-off-by: Rosen Xu 
> ---
>  drivers/bus/ifpga/Makefile  |  64 
>  drivers/bus/ifpga/ifpga_bus.c   | 527 
> 
>  drivers/bus/ifpga/ifpga_common.c| 168 +
>  drivers/bus/ifpga/ifpga_common.h|  46 +++
>  drivers/bus/ifpga/ifpga_logs.h  |  59 
>  drivers/bus/ifpga/rte_bus_ifpga.h   | 153 
>  drivers/bus/ifpga/rte_bus_ifpga_version.map |   8 +
>  7 files changed, 1025 insertions(+)
>  create mode 100644 drivers/bus/ifpga/Makefile
>  create mode 100644 drivers/bus/ifpga/ifpga_bus.c
>  create mode 100644 drivers/bus/ifpga/ifpga_common.c
>  create mode 100644 drivers/bus/ifpga/ifpga_common.h
>  create mode 100644 drivers/bus/ifpga/ifpga_logs.h
>  create mode 100644 drivers/bus/ifpga/rte_bus_ifpga.h
>  create mode 100644 drivers/bus/ifpga/rte_bus_ifpga_version.map
>
> diff --git a/drivers/bus/ifpga/Makefile b/drivers/bus/ifpga/Makefile
> new file mode 100644
> index 000..c71f186
> --- /dev/null
> +++ b/drivers/bus/ifpga/Makefile
> @@ -0,0 +1,64 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +# * Redistributions of source code must retain the above copyright
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright
> +#   notice, this list of conditions and the following disclaimer in
> +#   the documentation and/or other materials provided with the
> +#   distribution.
> +# * Neither the name of Intel Corporation nor the names of its
> +#   contributors may be used to endorse or promote products derived
> +#   from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

As of 18.02, I think all licensing has moved to SPDX. Maybe in formal
patch you should change to that.

> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_bus_ifpga.a
> +LIBABIVER := 1
> +EXPORT_MAP := rte_bus_ifpga_version.map
> +
> +ifeq ($(CONFIG_RTE_LIBRTE_DPAA2_DEBUG_INIT),y)

I think this is copy-paste issue - isn't it?
(CONFIG_RTE_LIBRTE_DPAA2_DEBUG_INIT)
I see that you have already enabled dynamic logging - in which case
you won't need this anyway.

> +CFLAGS += -O0 -g
> +CFLAGS += "-Wno-error"
> +else
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +endif
> +
> +CFLAGS += -I$(RTE_SDK)/drivers/bus/ifpga
> +CFLAGS += -I$(RTE_SDK)/drivers/bus/pci
> +CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
> +CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common
> +#CFLAGS += -I$(RTE_SDK)/lib/librte_rawdev
> +#LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring -lrte_rawdev
> +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
> +#LDLIBS += -lrte_ethdev
> +
> +VPATH += $(SRCDIR)/base
> +
> +SRCS-y += \
> +ifpga_bus.c \
> +ifpga_common.c
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c
> new file mode 100644
> index 000..382d550
> --- /dev/null
> +++ b/drivers/bus/ifpga/ifpga_bus.c
> @@ -0,0 +1,527 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   Copyright 2013-2014 6WIND S.A.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or 

Re: [dpdk-dev] [RFC 3/4] lib/librte_eal/common: Add Intel FPGA Bus Second Scan, it should be scanned after PCI Bus

2018-03-05 Thread Shreyansh Jain
On Tue, Mar 6, 2018 at 7:13 AM, Rosen Xu  wrote:
> Signed-off-by: Rosen Xu 
> ---
>  lib/librte_eal/common/eal_common_bus.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/common/eal_common_bus.c 
> b/lib/librte_eal/common/eal_common_bus.c
> index 3e022d5..74bfa15 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -70,15 +70,27 @@ struct rte_bus_list rte_bus_list =
>  rte_bus_scan(void)
>  {
> int ret;
> -   struct rte_bus *bus = NULL;
> +   struct rte_bus *bus = NULL, *ifpga_bus = NULL;
>
> TAILQ_FOREACH(bus, &rte_bus_list, next) {
> +   if (!strcmp(bus->name, "ifpga")) {
> +   ifpga_bus = bus;
> +   continue;
> +   }
> +
> ret = bus->scan();
> if (ret)
> RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
> bus->name);
> }
>
> +   if (ifpga_bus) {
> +   ret = ifpga_bus->scan();
> +   if (ret)
> +   RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
> +   ifpga_bus->name);
> +   }
> +

You are doing this just so that PCI scans are completed *before* ifpga scans?
Well, I understand that this certainly is an issue that we can't yet
define a priority ordering of bus scans.

But, I think what you are require is a simpler:

In the file ifpga_bus.c:

+RTE_REGISTER_BUS(IFPGA_BUS_NAME, rte_ifpga_bus.bus); <== this
...
...
#define RTE_REGISTER_BUS(nm, bus) \
RTE_INIT_PRIO(businitfn_ ##nm, 110); \

If you define your own version of RTE_REGISTER_BUS with the priority
number higher, it would be inserted later in the bus list.
rte_register_bus doesn't do any inherent ordering.
This would save the changes you are doing in the
lib/librte_eal/common/eal_common_bus.c file.

But I think there has to be a better provision of defining priority of
bus scans - I am sure when new devices come in, there would be
possibility of dependencies as in your case.

> return 0;
>  }
>
> --
> 1.8.3.1
>


Re: [dpdk-dev] [RFC v2, 2/2] eventdev: add crypto adapter API header

2018-03-05 Thread Akhil Goyal

Hi Abhinandan,

Sorry for the delayed response, my office network had some issues wrt 
NNTP, so couldn't reply.


On 2/28/2018 2:31 PM, Gujjar, Abhinandan S wrote:

Hi Akhil,



..




+ * crypto operation directly to cryptodev or send it  to the
+ cryptodev
+ * adapter via eventdev, the cryptodev adapter then submits the
+ crypto
+ * operation to the crypto device. The first mode is known as the


The first mode (DEQ) is very clear. In the second mode(ENQ_DEQ),
- How does "worker" submits the crypto work through crypto-adapter?
If I understand it correctly, "workers" always deals with only
cryptodev's
rte_cryptodev_enqueue_burst() API and "service" function in crypto
adapter would be responsible for dequeue() from cryptodev and enqueue to

eventdev?


I understand the need for OP_NEW vs OP_FWD mode difference in both

modes.

Other than that, What makes ENQ_DEQ different? Could you share the
flow for ENQ_DEQ mode with APIs.


/*
Application changes for ENQ_DEQ mode:
-
/* In ENQ_DEQ mode, to enqueue to adapter app
 * has to fill out following details.
 */
struct rte_event_crypto_request *req;
struct rte_crypto_op *op = rte_crypto_op_alloc();

/* fill request info */
req = (void *)((char *)op + op.private_data_offset);
req->cdev_id = 1;
req->queue_pair_id = 1;

/* fill response info */
...

/* send event to crypto adapter */
ev->event_ptr = op;
ev->queue_id = dst_event_qid;
ev->priority = dst_priority;
ev->sched_type = dst_sched_type;
ev->event_type = RTE_EVENT_TYPE_CRYPTODEV;
ev->sub_event_type = sub_event_type;
ev->flow_id = dst_flow_id;
ret = rte_event_enqueue_burst(event_dev_id, event_port_id, ev, 1);


Adapter in ENQ_DEQ mode, submitting crypto ops to cryptodev:
-
n = rte_event_dequeue_burst(event_dev_id, event_port_id, ev,

BATCH_SIZE, time_out);

struct rte_crypto_op *op = ev->event_ptr;
struct rte_event_crypto_request *req = (void *)op +

op.private_data_offset;

cdev_id = req->cdev_id;
qp_id = req->queue_pair_id

ret = rte_cryptodev_enqueue_burst(cdev_id, qp_id, op, 1);


This mode wont work for the HW implementations that I know. As in HW
implementations, The Adapter is embedded in HW.
The DEQ mode works. But, This would call for to have two separate
application logic for DEQ and ENQ_DEQ mode.
I think, it is unavoidable as SW scheme has better performance with ENQ_DEQ

MODE.


If you think, there is no option other than introducing a capability
in adapter then please create capability in Rx adapter to inform the
adapter capability to the application.

Do we think, it possible to have scheme with ENQ_DEQ mode, Where
application still enqueue to cryptodev like DEQ mode but using
cryptodev. ie. Adapter patches the cryptodev dev->enqueue_burst() to
"eventdev enqueue burst" followed by "exiting dev->enqueue_burst".
Something like exiting ethdev rx_burst callback scheme.
This will enable application to have unified flow IMO.

Any thoughts from NXP folks?


I see that there is performance gain in sw side while using ENQ_DEQ mode. But
since we already have many modes in the application already, can we make this
one with some callback to cryptodev.

So the application can call the rte_cryptodev_enqueue_burst() as it is doing, 
and
if the ENQ_DEQ mode is supported by the underneath implementation then, it
can register a callback to the implementation that is required in the driver 
layer
itself.

In ENQ-DEQ mode, crypto request are sent through the eventdev.
With your proposal, it is not clear how crypto request can be hidden under 
rte_cryptodev_enqueue_burst()!
Can you please share flow diagram or pseudo code?

-Abhinandan


The code flow is what Jerin also suggested.

"Adapter patches the cryptodev dev->enqueue_burst() to
"eventdev enqueue burst" followed by "exiting dev->enqueue_burst".
Something like exiting ethdev rx_burst callback scheme."

The suggestion was just to simplify the flow in the application.
My main concern is that the ipsec-secgw application is already having a 
lot of modes and we are about to add two more cases.




In this way, the application will become less complex as compared to the
2 parallel implementations for SW and HW. It will also give more flexibility to 
the
driver implementation as well.

-Akhil



*/



+ * dequeue only (DEQ) mode  and the second as the enqueue -
+ dequeue


extra space between "mode" and "and"

Ok



+ * (ENQ_DEQ) mode. The choice of mode can be specified when
+ creating
+ * the adapter.
+ * In the latter choice, the cryptodev adapter is able to use
+ * RTE_OP_FORWARD as the event dev enqueue type, this has a
+ performance
+ * advantage in "closed system" eventdevs like the eventdev SW PMD
+ and










Re: [dpdk-dev] [RFC 4/4] drivers/raw/ifpga_rawdev: Rawdev for Intel FPGA Device, it's a PCI Driver of FPGA Device Manager

2018-03-05 Thread Shreyansh Jain
On Tue, Mar 6, 2018 at 7:13 AM, Rosen Xu  wrote:
> Signed-off-by: Rosen Xu 
> ---
>  drivers/raw/ifpga_rawdev/Makefile  |  59 
>  drivers/raw/ifpga_rawdev/ifpga_rawdev.c| 343 
> +
>  drivers/raw/ifpga_rawdev/ifpga_rawdev.h| 109 +++
>  drivers/raw/ifpga_rawdev/ifpga_rawdev_example.c| 121 

When rawdev skeleton driver was integrated, Thomas raised this point
of naming 'skeleton_rawdev' rather than just 'skeleton'.
So, rather than 'ifpga_rawdev' rather than 'ifpga'.
At that time I thought we could use  as
model. But, frankly, to me it seems a bad choice now. Extra '_rawdev'
doesn't serve any purpose here.

So, feel free to change your naming to a more appropriate
"drivers/raw/ifpga/" or "drivers/raw/ifpga_sample" etc.

Probably I too can change the skeleton_rawdev to skeleton.

>  .../ifpga_rawdev/rte_pmd_ifpga_rawdev_version.map  |   4 +
>  5 files changed, 636 insertions(+)
>  create mode 100644 drivers/raw/ifpga_rawdev/Makefile
>  create mode 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev.c
>  create mode 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev.h
>  create mode 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev_example.c
>  create mode 100644 drivers/raw/ifpga_rawdev/rte_pmd_ifpga_rawdev_version.map
>
> diff --git a/drivers/raw/ifpga_rawdev/Makefile 
> b/drivers/raw/ifpga_rawdev/Makefile
> new file mode 100644
> index 000..3166fe2
> --- /dev/null
> +++ b/drivers/raw/ifpga_rawdev/Makefile
> @@ -0,0 +1,59 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +# * Redistributions of source code must retain the above copyright
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright
> +#   notice, this list of conditions and the following disclaimer in
> +#   the documentation and/or other materials provided with the
> +#   distribution.
> +# * Neither the name of Intel Corporation nor the names of its
> +#   contributors may be used to endorse or promote products derived
> +#   from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +

SPDX identifier in place of BSD boiler-plate.

> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_ifpga_rawdev.a
> +
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -I$(RTE_SDK)/drivers/bus/ifpga
> +CFLAGS += -I$(RTE_SDK)/drivers/raw/ifpga_rawdev
> +LDLIBS += -lrte_eal
> +LDLIBS += -lrte_rawdev
> +LDLIBS += -lrte_bus_vdev
> +LDLIBS += -lrte_kvargs
> +
> +EXPORT_MAP := rte_pmd_ifpga_rawdev_version.map
> +
> +LIBABIVER := 1
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV) += ifpga_rawdev.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV) += ifpga_rawdev_example.c

This is a copy-paste issue - CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV

> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/raw/ifpga_rawdev/ifpga_rawdev.c 
> b/drivers/raw/ifpga_rawdev/ifpga_rawdev.c
> new file mode 100644
> index 000..6046711
> --- /dev/null
> +++ b/drivers/raw/ifpga_rawdev/ifpga_rawdev.c
> @@ -0,0 +1,343 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2016 NXP.

:) - should be Intel.
Even better - SPDX

> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + *   

Re: [dpdk-dev] [RFC v2, 2/2] eventdev: add crypto adapter API header

2018-03-05 Thread Akhil Goyal

Hi Narender,
On 3/4/2018 4:12 AM, Vangati, Narender wrote:

Akhil,
I'm probably missing a point somewhere but I don't follow the suggestions. To 
me, ethdev, cryptodev, eventdev, etc. are device abstractions, whereas the 
proposed ENQ mode isn't at the same level.
The DEQ mode is a device abstraction for cryptodev->eventdev (whether h/w or 
s/w based), but the ENQ part of the adapter is purely a s/w programming model and 
optional to the application. It is independent of any device and it’s an 
application choice whether it wants to use this or not. Nothing prevents the 
application from calling cryptodev_enqueue_burst towards any device directly 
(whether it be soft crypto, NXP, Cavium, QAT, etc ) within an eventdev based 
environment.
The ENQ mode allows an application programming model to be completely event 
based. If the application chooses, it enables the ENQ part where it enqueues an 
rte_event to the s/w adapter and the adapter then calls cryptodev_enqueue_burst 
on its behalf, towards any device PMD which was created.
There are certain benefits to application architecture using this adapter where 
you can leverage the ordered scheduling within eventdev etc., (and certain cons 
where you need to run this service somewhere) but that’s up to the application 
to decide.

In other words, I don’t consider ENQ mode as a device abstraction like 
cryptodev or ethdev where it needs to plug in to something transparently but a 
programming model that is provided as a choice, and that shouldn’t be tied up 
into a device abstraction layer.

vnr


I am not against of eventdev enqueue API, or let the application decide 
to use it or not.
I am trying to limit more options in already multi-option IPSEC 
usecases. It is getting too confusing.


Also, my concern is that sw based crypto-event can also follow the same 
path as the hw based in this case. This will help the application to 
have a common code for both the cases.



-Akhil

---


-Original Message-
From: Akhil Goyal [mailto:akhil.go...@nxp.com]
Sent: Monday, February 26, 2018 7:52 AM
To: Jerin Jacob ; Gujjar, Abhinandan S 

Cc: dev@dpdk.org; Vangati, Narender ; Rao, Nikhil 
; Eads, Gage ; hemant.agra...@nxp.com; 
narayanaprasad.athr...@cavium.com; nidadavolu.mur...@cavium.com; nithin.dabilpu...@cavium.com
Subject: Re: [RFC v2, 2/2] eventdev: add crypto adapter API header

Hi Jerin/Abhinandan,

On 2/20/2018 7:29 PM, Jerin Jacob wrote:

-Original Message-

Date: Mon, 19 Feb 2018 10:55:58 +
From: "Gujjar, Abhinandan S" 
To: Jerin Jacob 
CC: "dev@dpdk.org" , "Vangati, Narender"
   , "Rao, Nikhil" , "Eads,
   Gage" , "hemant.agra...@nxp.com"
   , "akhil.go...@nxp.com" ,
   "narayanaprasad.athr...@cavium.com" ,
   "nidadavolu.mur...@cavium.com" ,
   "nithin.dabilpu...@cavium.com" 
Subject: RE: [RFC v2, 2/2] eventdev: add crypto adapter API header

Hi Jerin,


Hi Abhinandan,



Thanks for the review. Please find few comments inline.


-Original Message-
From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
Sent: Saturday, February 17, 2018 1:04 AM
To: Gujjar, Abhinandan S 
Cc: dev@dpdk.org; Vangati, Narender ;
Rao, Nikhil ; Eads, Gage
; hemant.agra...@nxp.com; akhil.go...@nxp.com;
narayanaprasad.athr...@cavium.com; nidadavolu.mur...@cavium.com;
nithin.dabilpu...@cavium.com
Subject: Re: [RFC v2, 2/2] eventdev: add crypto adapter API header

-Original Message-

Date: Mon, 15 Jan 2018 16:23:50 +0530
From: Abhinandan Gujjar 
To: jerin.ja...@caviumnetworks.com
CC: dev@dpdk.org, narender.vang...@intel.com, Abhinandan Gujjar
, Nikhil Rao ,
Gage Eads 
Subject: [RFC v2, 2/2] eventdev: add crypto adapter API header
X-Mailer: git-send-email 1.9.1

+
+/**
+ * This adapter adds support to enqueue crypto completions to event device.
+ * The packet flow from cryptodev to the event device can be
+accomplished
+ * using both SW and HW based transfer mechanisms.
+ * The adapter uses a EAL service core function for SW based
+packet transfer
+ * and uses the eventdev PMD functions to configure HW based
+packet transfer
+ * between the cryptodev and the event device.
+ *
+ * In the case of SW based transfers, application can choose to
+submit a


I think, we can remove "In the case of SW based transfers" as it
should be applicable for HW case too

Ok. In that case, adapter will detect the presence of HW connection
between cryptodev & eventdev and will not dequeue crypto completions.


I would say presence of "specific capability" instead of HW.






+ * crypto operation directly to cryptodev or send it  to the
+ cryptodev
+ * adapter via eventdev, the cryptodev adapter then submits the
+ crypto
+ * operation to the crypto device. The first mode is known as the


The first mode (DEQ) is very clear. In the second mode(ENQ_DEQ),
- How does "worker" submits the crypto work through crypto-adapter?
If I understand it correctly, "workers" always deals with only
cryptodev's
rte_cryptodev_enqueue_burst() API and "service" function in crypto
adapter wo

Re: [dpdk-dev] [RFC 4/4] drivers/raw/ifpga_rawdev: Rawdev for Intel FPGA Device, it's a PCI Driver of FPGA Device Manager

2018-03-05 Thread Shreyansh Jain
Just wanted to rephrase my wordings as they seem to be presenting
different meaning from what I was intending.

On Tue, Mar 6, 2018 at 12:18 PM, Shreyansh Jain  wrote:
> On Tue, Mar 6, 2018 at 7:13 AM, Rosen Xu  wrote:
>> Signed-off-by: Rosen Xu 
>> ---
>>  drivers/raw/ifpga_rawdev/Makefile  |  59 
>>  drivers/raw/ifpga_rawdev/ifpga_rawdev.c| 343 
>> +
>>  drivers/raw/ifpga_rawdev/ifpga_rawdev.h| 109 +++
>>  drivers/raw/ifpga_rawdev/ifpga_rawdev_example.c| 121 
>
> When rawdev skeleton driver was integrated, Thomas raised this point
> of naming 'skeleton_rawdev' rather than just 'skeleton'.

Thomas questioned why not 'skeleton' and I stuck to 'skeleton_rawdev'.
Which, in hindsight, seems to me a bad decision on my part.

> So, rather than 'ifpga_rawdev' rather than 'ifpga'.

So, rather than 'ifpga_rawdev', why not use 'ifpga'.

> At that time I thought we could use  as
> model. But, frankly, to me it seems a bad choice now. Extra '_rawdev'
> doesn't serve any purpose here.
>
> So, feel free to change your naming to a more appropriate
> "drivers/raw/ifpga/" or "drivers/raw/ifpga_sample" etc.
>
> Probably I too can change the skeleton_rawdev to skeleton.

[snip]