date:20180430

Re: [dpdk-dev] [PATCH 2/2] net/mlx5: add Rx and Tx tuning parameters

2018-04-30 Thread Nélio Laranjeiro

On Sun, Apr 29, 2018 at 09:03:08PM +0300, Shahaf Shuler wrote:
> A new ethdev API was exposed by
> commit 3be82f5cc5e3 ("ethdev: support PMD-tuned Tx/Rx parameters")
> 
> Enabling the PMD to provide default parameters in case no strict request
> from application in order to improve the out of the box experience.
> 
> While the current API lacks the means for the PMD to provide the best
> possible value, providing the best default the PMD can guess.
> The values are based on Mellanox performance report and depends on the
> underlying NIC capabilities.
> 
> Cc: ere...@mellanox.com
> Cc: am...@mellanox.com
> Cc: ol...@mellanox.com
> 
> Signed-off-by: Shahaf Shuler 
> ---
>  drivers/net/mlx5/mlx5_ethdev.c | 51 
> ++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
> index 588d4ba627..78354922b0 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -417,6 +417,56 @@ mlx5_dev_configure(struct rte_eth_dev *dev)
>  }
>  
>  /**
> + * Sets default tuning parameters.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param[out] info
> + *   Info structure output buffer.
> + */
> +static void
> +mlx5_set_default_params(struct rte_eth_dev *dev, struct rte_eth_dev_info 
> *info)
> +{
> + struct priv *priv = dev->data->dev_private;
> +
> + if (priv->link_speed_capa & ETH_LINK_SPEED_100G) {
> + if (dev->data->nb_rx_queues <= 2 &&
> + dev->data->nb_tx_queues <= 2) {
> + /* Minimum CPU utilization. */
> + info->default_rxportconf.ring_size = 256;
> + info->default_txportconf.ring_size = 256;
> + /* Don't care as queue num is set. */
> + info->default_rxportconf.nb_queues = 0;
> + info->default_txportconf.nb_queues = 0;
> + } else {
> + /* Max Throughput. */
> + info->default_rxportconf.ring_size = 2048;
> + info->default_txportconf.ring_size = 2048;
> + info->default_rxportconf.nb_queues = 16;
> + info->default_txportconf.nb_queues = 16;
> + }
> + } else {
> + if (dev->data->nb_rx_queues <= 2 &&
> + dev->data->nb_tx_queues <= 2) {
> + /* Minimum CPU utilization. */
> + info->default_rxportconf.ring_size = 256;
> + info->default_txportconf.ring_size = 256;
> + /* Don't care as queue num is set. */
> + info->default_rxportconf.nb_queues = 0;
> + info->default_txportconf.nb_queues = 0;
> + } else {
> + /* Max Throughput. */
> + info->default_rxportconf.ring_size = 4096;
> + info->default_txportconf.ring_size = 4096;
> + info->default_rxportconf.nb_queues = 8;
> + info->default_txportconf.nb_queues = 8;
> + }
> + }
> + info->default_rxportconf.burst_size = 64;
> + info->default_txportconf.burst_size = 64;

This can be fully re-written to simplify and ease the maintenance,
default values i.e. "Minimum CPU utilization" are duplicated, this can
be used as default values and just tweak in case the amount of queues
are different from 2 according to the link speed. 

> +}
> +
> +/**
>   * DPDK callback to get information about the device.
>   *
>   * @param dev
> @@ -458,6 +508,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct 
> rte_eth_dev_info *info)
>   info->hash_key_size = rss_hash_default_key_len;
>   info->speed_capa = priv->link_speed_capa;
>   info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK;
> + mlx5_set_default_params(dev, info);
>  }
>  
>  /**
> -- 
> 2.12.0

Thanks,

-- 
Nélio Laranjeiro
6WIND

Re: [dpdk-dev] [PATCH 1/2] net/mlx5: fix ethtool link setting call order

2018-04-30 Thread Nélio Laranjeiro

On Sun, Apr 29, 2018 at 09:03:07PM +0300, Shahaf Shuler wrote:
> According to ethtool_link_setting API recommendation ETHTOOL_GLINKSETTINGS
> should be called before ETHTOOL_GSET as the later one deprecated.
> 
> Fixes: f47ba80080ab ("net/mlx5: remove kernel version check")
> Cc: nelio.laranje...@6wind.com
> 
> Signed-off-by: Shahaf Shuler 

Acked-by: Nelio Laranjeiro 

> ---
>  drivers/net/mlx5/mlx5_ethdev.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
> index 746b94f734..588d4ba627 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -697,9 +697,9 @@ mlx5_link_update(struct rte_eth_dev *dev, int 
> wait_to_complete)
>   time_t start_time = time(NULL);
>  
>   do {
> - ret = mlx5_link_update_unlocked_gset(dev, &dev_link);
> + ret = mlx5_link_update_unlocked_gs(dev, &dev_link);
>   if (ret)
> - ret = mlx5_link_update_unlocked_gs(dev, &dev_link);
> + ret = mlx5_link_update_unlocked_gset(dev, &dev_link);
>   if (ret == 0)
>   break;
>   /* Handle wait to complete situation. */
> -- 
> 2.12.0
> 

-- 
Nélio Laranjeiro
6WIND

Re: [dpdk-dev] [PATCH 0/4] support for write combining

2018-04-30 Thread Rafał Kozik

Hello Bruce,

It should work because decision about kind of mapping is made in patch
3 based on PMD request.

ENA use only one BAR in wc mode and two other without caching,
therefore not making remap in igb_uio rather not spoil anything.

I added patch 1 because this variable is provided also outside the
DPDK to Linux generic functions. I cannot test it with all drivers,
therefore I make it configurable.

But general combination of wc and not wc PMD should work, because of patch 3.
Also if WC is enabled by PMD (like in patch 4) not all BARs will by
mapped, but only those which has prefetchable flag enabled.

Best regards,
Rafal Kozik

Re: [dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles

2018-04-30 Thread Burakov, Anatoly


On 28-Apr-18 10:38 AM, Andrew Rybchenko wrote:

On 04/25/2018 01:36 PM, Anatoly Burakov wrote:

The original implementation used flock() locks, but was later
switched to using fcntl() locks for page locking, because
fcntl() locks allow locking parts of a file, which is useful
for single-file segments mode, where locking the entire file
isn't as useful because we still need to grow and shrink it.

However, according to fcntl()'s Ubuntu manpage [1], semantics of
fcntl() locks have a giant oversight:

   This interface follows the completely stupid semantics of System
   V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all
   locks associated with a file for a given process are removed
   when any file descriptor for that file is closed by that process.
   This semantic means that applications must be aware of any files
   that a subroutine library may access.

Basically, closing *any* fd with an fcntl() lock (which we do because
we don't want to leak fd's) will drop the lock completely.

So, in this commit, we will be reverting back to using flock() locks
everywhere. However, that still leaves the problem of locking parts
of a memseg list file in single file segments mode, and we will be
solving it with creating separate lock files per each page, and
tracking those with flock().

We will also be removing all of this tailq business and replacing it
with a simple array - saving a few bytes is not worth the extra
hassle of dealing with pointers and potential memory allocation
failures. Also, remove the tailq lock since it is not needed - these
fd lists are per-process, and within a given process, it is always
only one thread handling access to hugetlbfs.

So, first one to allocate a segment will create a lockfile, and put
a shared lock on it. When we're shrinking the page file, we will be
trying to take out a write lock on that lockfile, which would fail if
any other process is holding onto the lockfile as well. This way, we
can know if we can shrink the segment file. Also, if no other locks
are found in the lock list for a given memseg list, the memseg list
fd is automatically closed.

One other thing to note is, according to flock() Ubuntu manpage [2],
upgrading the lock from shared to exclusive is implemented by dropping
and reacquiring the lock, which is not atomic and thus would have
created race conditions. So, on attempting to perform operations in
hugetlbfs, we will take out a writelock on hugetlbfs directory, so
that only one process could perform hugetlbfs operations concurrently.

[1] 
http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html

[2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html

Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime")
Fixes: 2a04139f66b4 ("eal: add single file segments option")
Cc: anatoly.bura...@intel.com

Signed-off-by: Anatoly Burakov 
Acked-by: Bruce Richardson 


We have a problem with the changeset if EAL option -m or --socket-mem is 
used.

EAL initialization hangs just after EAL: Probing VFIO support...
strace points to flock(7, LOCK_EX
List of file descriptors:
# ls /proc/25452/fd -l
total 0
lrwx-- 1 root root 64 Apr 28 10:34 0 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:34 1 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:32 2 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:34 3 -> /run/.rte_config
lrwx-- 1 root root 64 Apr 28 10:34 4 -> socket:[154166]
lrwx-- 1 root root 64 Apr 28 10:34 5 -> socket:[154158]
lr-x-- 1 root root 64 Apr 28 10:34 6 -> /dev/hugepages
lr-x-- 1 root root 64 Apr 28 10:34 7 -> /dev/hugepages

I guess the problem is that there are two /dev/hugepages and
it hangs on the second.

Ideas how to solve it?

Andrew.



Seeing similar reports from validation. I'm looking into it.

--
Thanks,
Anatoly

Re: [dpdk-dev] [PATCH 0/4] support for write combining

2018-04-30 Thread Bruce Richardson

On Mon, Apr 30, 2018 at 10:07:07AM +0200, Rafał Kozik wrote:
> Hello Bruce,
> 
> It should work because decision about kind of mapping is made in patch
> 3 based on PMD request.
> 
> ENA use only one BAR in wc mode and two other without caching,
> therefore not making remap in igb_uio rather not spoil anything.
> 
> I added patch 1 because this variable is provided also outside the
> DPDK to Linux generic functions. I cannot test it with all drivers,
> therefore I make it configurable.
> 
> But general combination of wc and not wc PMD should work, because of patch 3.
> Also if WC is enabled by PMD (like in patch 4) not all BARs will by
> mapped, but only those which has prefetchable flag enabled.
> 
Sounds good, so in that case I've no objection to the patch.

Acked-by: Bruce Richardson

Re: [dpdk-dev] [PATCH v4 3/9] mem: fix potential double close

2018-04-30 Thread Bruce Richardson

On Fri, Apr 27, 2018 at 06:07:04PM +0100, Anatoly Burakov wrote:
> We were closing descriptor before checking if mapping has
> failed, but if it did, we did a second close afterwards. Fix
> it by moving closing descriptor to after we've done all error
> checks.
> 
> Coverity issue: 272560
> 
> Fixes: 2a04139f66b4 ("eal: add single file segments option")
> Cc: anatoly.bura...@intel.com
> 
> Signed-off-by: Anatoly Burakov 
> ---
> 
> Notes:
> v4:
> - Moved fd close to until after all error checks are done
> 
Acked-by: Bruce Richardson

[dpdk-dev] [PATCH 1/2] examples/vhost: fix header copy to discontiguous desc buffer

2018-04-30 Thread Maxime Coquelin

In the loop to copy virtio-net header to the descriptor buffer,
destination pointer was incremented instead of the source
pointer.

Coverity issue: 277240
Fixes: 82c93a567d3b ("examples/vhost: move to safe GPA translation API")

Signed-off-by: Maxime Coquelin 
---
 examples/vhost/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 5a965a346..8ea6b36d5 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -103,7 +103,7 @@ enqueue_pkt(struct vhost_dev *dev, struct rte_vhost_vring 
*vr,
 
remain -= len;
guest_addr += len;
-   dst += len;
+   src += len;
}
 
desc_chunck_len = desc->len - dev->hdr_len;
-- 
2.14.3

[dpdk-dev] [PATCH 0/2] Fix enqueueing vnet header in discontiguous decs buffer

2018-04-30 Thread Maxime Coquelin

This series fixes copying virtio net header to discontiguous descriptor
buffer in VA space.

The issue was spotted by Coverity for examples/vhost, but same issue
was present in vhost-user library.

Maxime Coquelin (2):
  examples/vhost: fix header copy to discontiguous desc buffer
  vhost: fix header copy to discontiguous desc buffer

 examples/vhost/virtio_net.c   | 2 +-
 lib/librte_vhost/virtio_net.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

-- 
2.14.3

[dpdk-dev] [PATCH 2/2] vhost: fix header copy to discontiguous desc buffer

2018-04-30 Thread Maxime Coquelin

In the loop to copy virtio-net header to the descriptor buffer,
destination pointer was incremented instead of the source
pointer.

Fixes: fb3815cc614d ("vhost: handle virtually non-contiguous buffers in Rx-mrg")
Fixes: 6727f5a739b6 ("vhost: handle virtually non-contiguous buffers in Rx")

Cc: sta...@dpdk.org

Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 5fdd4172b..eed6b0227 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -277,7 +277,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
vhost_log_write(dev, guest_addr, len);
remain -= len;
guest_addr += len;
-   dst += len;
+   src += len;
}
}
 
@@ -771,7 +771,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
 
remain -= len;
guest_addr += len;
-   dst += len;
+   src += len;
}
} else {
PRINT_PACKET(dev, (uintptr_t)hdr_addr,
-- 
2.14.3

[dpdk-dev] [PATCH] eal: check if hugedir write lock is already being held

2018-04-30 Thread Anatoly Burakov

At hugepage info initialization, EAL takes out a write lock on
hugetlbfs directories, and drops it after the memory init is
finished. However, in non-legacy mode, if "-m" or "--socket-mem"
switches are passed, this leads to a deadlock because EAL tries
to allocate pages (and thus take out a write lock on hugedir)
while still holding a separate hugedir write lock in EAL.

Fix it by checking if write lock in hugepage info is active, and
not trying to lock the directory if the hugedir fd is valid.

Fixes: 1a7dc2252f28 ("mem: revert to using flock and add per-segment lockfiles")
Cc: anatoly.bura...@intel.com

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_memalloc.c | 71 ++
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c 
b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 00d7886..360d8f7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -666,7 +666,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
struct alloc_walk_param *wa = arg;
struct rte_memseg_list *cur_msl;
size_t page_sz;
-   int cur_idx, start_idx, j, dir_fd;
+   int cur_idx, start_idx, j, dir_fd = -1;
unsigned int msl_idx, need, i;
 
if (msl->page_sz != wa->page_sz)
@@ -691,19 +691,24 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void 
*arg)
 * because file creation and locking operations are not atomic,
 * and we might be the first or the last ones to use a particular page,
 * so we need to ensure atomicity of every operation.
+*
+* during init, we already hold a write lock, so don't try to take out
+* another one.
 */
-   dir_fd = open(wa->hi->hugedir, O_RDONLY);
-   if (dir_fd < 0) {
-   RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n", __func__,
-   wa->hi->hugedir, strerror(errno));
-   return -1;
-   }
-   /* blocking writelock */
-   if (flock(dir_fd, LOCK_EX)) {
-   RTE_LOG(ERR, EAL, "%s(): Cannot lock '%s': %s\n", __func__,
-   wa->hi->hugedir, strerror(errno));
-   close(dir_fd);
-   return -1;
+   if (wa->hi->lock_descriptor == -1) {
+   dir_fd = open(wa->hi->hugedir, O_RDONLY);
+   if (dir_fd < 0) {
+   RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
+   __func__, wa->hi->hugedir, strerror(errno));
+   return -1;
+   }
+   /* blocking writelock */
+   if (flock(dir_fd, LOCK_EX)) {
+   RTE_LOG(ERR, EAL, "%s(): Cannot lock '%s': %s\n",
+   __func__, wa->hi->hugedir, strerror(errno));
+   close(dir_fd);
+   return -1;
+   }
}
 
for (i = 0; i < need; i++, cur_idx++) {
@@ -742,7 +747,8 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
if (wa->ms)
memset(wa->ms, 0, sizeof(*wa->ms) * wa->n_segs);
 
-   close(dir_fd);
+   if (dir_fd >= 0)
+   close(dir_fd);
return -1;
}
if (wa->ms)
@@ -754,7 +760,8 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
wa->segs_allocated = i;
if (i > 0)
cur_msl->version++;
-   close(dir_fd);
+   if (dir_fd >= 0)
+   close(dir_fd);
return 1;
 }
 
@@ -769,7 +776,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
struct rte_memseg_list *found_msl;
struct free_walk_param *wa = arg;
uintptr_t start_addr, end_addr;
-   int msl_idx, seg_idx, ret, dir_fd;
+   int msl_idx, seg_idx, ret, dir_fd = -1;
 
start_addr = (uintptr_t) msl->base_va;
end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
@@ -788,19 +795,24 @@ free_seg_walk(const struct rte_memseg_list *msl, void 
*arg)
 * because file creation and locking operations are not atomic,
 * and we might be the first or the last ones to use a particular page,
 * so we need to ensure atomicity of every operation.
+*
+* during init, we already hold a write lock, so don't try to take out
+* another one.
 */
-   dir_fd = open(wa->hi->hugedir, O_RDONLY);
-   if (dir_fd < 0) {
-   RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n", __func__,
-   wa->hi->hugedir, strerror(errno));
-   return -1;
-   }
-   /* blocking writelock */
-   if (flock(dir_fd, LOCK_EX)) {
-   RTE_LOG(ERR, EAL, "%s(): Cannot lock '%s': %s\n", __func__,
-   wa->hi->hugedir, s

[dpdk-dev] [PATCH] vhost/crypto: fix incorrect bracket location

2018-04-30 Thread Fan Zhang

Coverity issue: 277232
Coverity issue: 277237
Fixes: 3bb595ecd682 ("vhost/crypto: add request handler")

Signed-off-by: Fan Zhang 
---
 lib/librte_vhost/vhost_crypto.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost_crypto.c b/lib/librte_vhost/vhost_crypto.c
index c38eb3bb5..3fa50281c 100644
--- a/lib/librte_vhost/vhost_crypto.c
+++ b/lib/librte_vhost/vhost_crypto.c
@@ -675,8 +675,8 @@ prepare_sym_cipher_op(struct vhost_crypto *vcrypto, struct 
rte_crypto_op *op,
goto error_exit;
}
if (unlikely(copy_data(rte_pktmbuf_mtod(m_src, uint8_t *), head,
-   mem, &desc, cipher->para.src_data_len))
-   < 0) {
+   mem, &desc, cipher->para.src_data_len)
+   < 0)) {
ret = VIRTIO_CRYPTO_BADMSG;
goto error_exit;
}
-- 
2.13.6

[dpdk-dev] [PATCH] vhost/crypto: fix bracket

2018-04-30 Thread Fan Zhang

Coverity issue: 233232
Coverity issue: 233237
Fixes: 3bb595ecd682 ("vhost/crypto: add request handler")

Signed-off-by: Fan Zhang 
---
 lib/librte_vhost/vhost_crypto.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost_crypto.c b/lib/librte_vhost/vhost_crypto.c
index c38eb3bb5..a3bce6379 100644
--- a/lib/librte_vhost/vhost_crypto.c
+++ b/lib/librte_vhost/vhost_crypto.c
@@ -675,8 +675,7 @@ prepare_sym_cipher_op(struct vhost_crypto *vcrypto, struct 
rte_crypto_op *op,
goto error_exit;
}
if (unlikely(copy_data(rte_pktmbuf_mtod(m_src, uint8_t *), head,
-   mem, &desc, cipher->para.src_data_len))
-   < 0) {
+   mem, &desc, cipher->para.src_data_len) < 0)) {
ret = VIRTIO_CRYPTO_BADMSG;
goto error_exit;
}
-- 
2.13.6

Re: [dpdk-dev] [PATCH] vhost/crypto: fix bracket

2018-04-30 Thread Maxime Coquelin




On 04/30/2018 12:36 PM, Fan Zhang wrote:

Coverity issue: 233232
Coverity issue: 233237
Fixes: 3bb595ecd682 ("vhost/crypto: add request handler")

Signed-off-by: Fan Zhang 
---
  lib/librte_vhost/vhost_crypto.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)



Reviewed-by: Maxime Coquelin 

Thanks,
Maxime

Re: [dpdk-dev] [PATCH] vhost/crypto: fix incorrect bracket location

2018-04-30 Thread Maxime Coquelin




On 04/30/2018 12:31 PM, Fan Zhang wrote:

Coverity issue: 277232
Coverity issue: 277237
Fixes: 3bb595ecd682 ("vhost/crypto: add request handler")

Signed-off-by: Fan Zhang 
---
  lib/librte_vhost/vhost_crypto.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Maxime Coquelin 

Thanks,
Maxime

Re: [dpdk-dev] [dpdk-web] [PATCH v2] update stable releases roadmap

2018-04-30 Thread Thomas Monjalon

25/04/2018 12:03, Luca Boccassi:
> On Wed, 2018-04-25 at 09:33 +0100, Ferruh Yigit wrote:
> > On 4/20/2018 4:52 PM, Aaron Conole wrote:
> > > Kevin Traynor  writes:
> > > > On 04/18/2018 02:28 PM, Thomas Monjalon wrote:
> > > > > 18/04/2018 14:28, Ferruh Yigit:
> > > > > > On 4/18/2018 10:14 AM, Thomas Monjalon wrote:
> > > > > > > 18/04/2018 11:05, Ferruh Yigit:
> > > > > > > > On 4/11/2018 12:28 AM, Thomas Monjalon wrote:
> > > > > > > > > - Typically a new stable release version
> > > > > > > > > follows a mainline release
> > > > > > > > > - by 1-2 weeks, depending on the test results.
> > > > > > > > > + The first stable release (.1) of a branch
> > > > > > > > > should follow
> > > > > > > > > + its mainline release (.0) by at least two
> > > > > > > > > months,
> > > > > > > > > + after the first release candidate (-rc1) of
> > > > > > > > > the next branch.
> > > > > > > > 
> > > > > > > > Hi Thomas,
> > > > > > > > 
> > > > > > > > What this change suggest? To be able to backport patches
> > > > > > > > from rc1?
> > > > > > > 
> > > > > > > Yes, it is the proposal we discussed earlier.
> > > > > > > We can wait one week after RC1 to get some validation
> > > > > > > confirmation.
> > > > > > > Do you agree?
> > > > > > 
> > > > > > This has been discussed in tech-board, what I remember the
> > > > > > decision was to wait
> > > > > > the release to backport patches into stable tree.
> > > > 
> > > > Any minutes? I couldn't find them
> > > > 
> > > > > It was not so clear to me.
> > > > > I thought post-rc1 was acceptable. The idea is to speed-up
> > > > > stable releases
> > > > > pace, especially first release of a series.
> > > > > 
> > > > > 
> > > > 
> > > > I think timing of stable releases and bugfix backports to the
> > > > stable
> > > > branch are two separate items.
> > > > 
> > > > I do think that bugfix backports to stable should happen on a
> > > > regular
> > > > basis (e.g. every 2 weeks). Otherwise we are back to the
> > > > situation where
> > > > if there's a bugfix after a DPDK release, a user like (surprise,
> > > > surprise) OVS may not be able to use that DPDK version for ~3
> > > > months.
> > > > 
> > > > Someone who wants to get the latest bugfixes can just take the
> > > > latest on
> > > > the stable branch and importantly, can have confidence that the
> > > > community has officially accepted those patches. If someone
> > > > requires
> > > > stable to be validated, then they have to wait until the release.
> > > 
> > > +1 - this seems to make the most sense to me.  Keep the patches
> > > flowing,
> > > but don't label/tag it until validation.  That serves an additional
> > > function: developers know their CC's to stable are being processed.
> > 
> > Are stable trees verified?
> 
> Verification is one issue - so far, Intel and ATT have provided time
> and resources to do some regression tests, but only at release time
> (before tagging). And it has been a manual process.
> It would be great if more companies would step up to help - and even
> better if regressions could be automated (nightly job?).
> 
> The other issue is deciding when a patch is "good to go" - until now,
> the criteria has been "when it's merged into master".
> So either that criteria needs to change, and another equally
> "authoritative" is decided on, or patches should get reviewed and
> merged in master more often and more quickly :-P
> 
> We also have not been looking directly at the the various -next trees,
> as things are more "in-flux" there and could be reverted, or clash with
> changes from other trees - hence why we merge from master.

Yes, backporting from master is definitely the right thing to do.
Backporting more regularly would be also an improvement.
There will be always the question of the bug-free ideal in stable
branches. I agree we need more help to validate the stable branches.
But realistically, it will never be perfect.

So the questions are:
- What we must wait before pushing a backport in the stable tree?
- What we must wait before tagging a stable release?

I think it is reasonnable to push backports one or two weeks after
it is in the master branch, assuming master is tested by the community.
If a corner case is found later, it will be fixed with another patch.
That's why it's important to wait a validation period (happening after
each release candidate) before tagging a stable release.
So, if we are aware of a regression in the master branch, which has been
backported, we can wait few more days to fix it.
The last thing we need to consider before tagging, is the validation of
the stable release itself. Are we able to run some non-regression tests
on the stable branch if it is ready few days after a RC1?

Re: [dpdk-dev] [v2, 2/6] eventdev: add APIs and PMD callbacks for crypto adapter

2018-04-30 Thread Gujjar, Abhinandan S



> -Original Message-
> From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> Sent: Sunday, April 29, 2018 9:44 PM
> To: Gujjar, Abhinandan S 
> Cc: hemant.agra...@nxp.com; akhil.go...@nxp.com; dev@dpdk.org; Vangati,
> Narender ; Rao, Nikhil ;
> Eads, Gage 
> Subject: Re: [v2,2/6] eventdev: add APIs and PMD callbacks for crypto adapter
> 
> -Original Message-
> > Date: Tue, 24 Apr 2018 18:13:23 +0530
> > From: Abhinandan Gujjar 
> > To: jerin.ja...@caviumnetworks.com, hemant.agra...@nxp.com,
> > akhil.go...@nxp.com, dev@dpdk.org
> > CC: narender.vang...@intel.com, abhinandan.guj...@intel.com,
> > nikhil@intel.com, gage.e...@intel.com
> > Subject: [v2,2/6] eventdev: add APIs and PMD callbacks for crypto
> > adapter
> > X-Mailer: git-send-email 1.9.1
> >
> > Signed-off-by: Abhinandan Gujjar 
> > ---
> >  drivers/event/sw/sw_evdev.c|  13 +++
> >  lib/librte_eventdev/rte_eventdev.c |  25 +
> >  lib/librte_eventdev/rte_eventdev.h |  52 +
> >  lib/librte_eventdev/rte_eventdev_pmd.h | 189
> > +
> >  4 files changed, 279 insertions(+)
> >
> > diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
> > index dcb6551..10f0e1a 100644
> > --- a/drivers/event/sw/sw_evdev.c
> > +++ b/drivers/event/sw/sw_evdev.c
> > @@ -480,6 +480,17 @@
> > return 0;
> >  }
> >
> > +static int
> > +sw_crypto_adapter_caps_get(const struct rte_eventdev *dev,
> > +  const struct rte_cryptodev *cdev,
> > +  uint32_t *caps)
> > +{
> > +   RTE_SET_USED(dev);
> > +   RTE_SET_USED(cdev);
> > +   *caps = RTE_EVENT_CRYPTO_ADAPTER_SW_CAP;
> > +   return 0;
> > +}
> > +
> >  static void
> >  sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info
> > *info)  { @@ -809,6 +820,8 @@ static int32_t
> > sw_sched_service_func(void *args)
> >
> > .timer_adapter_caps_get =
> sw_timer_adapter_caps_get,
> >
> > +   .crypto_adapter_caps_get =
> sw_crypto_adapter_caps_get,
> > +
> > .xstats_get = sw_xstats_get,
> > .xstats_get_names = sw_xstats_get_names,
> > .xstats_get_by_name = sw_xstats_get_by_name, diff --
> git
> > a/lib/librte_eventdev/rte_eventdev.c
> > b/lib/librte_eventdev/rte_eventdev.c
> > index 3f016f4..7ca9fd1 100644
> > --- a/lib/librte_eventdev/rte_eventdev.c
> > +++ b/lib/librte_eventdev/rte_eventdev.c
> > @@ -29,6 +29,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >
> >  #include "rte_eventdev.h"
> >  #include "rte_eventdev_pmd.h"
> > @@ -145,6 +147,29 @@
> > : 0;
> >  }
> >
> > +int __rte_experimental
> > +rte_event_crypto_adapter_caps_get(uint8_t dev_id, uint8_t cdev_id,
> > + uint32_t *caps)
> > +{
> > +   struct rte_eventdev *dev;
> > +   struct rte_cryptodev *cdev;
> > +
> > +   RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
> > +   if (!rte_cryptodev_pmd_is_valid_dev(cdev_id))
> > +   return -EINVAL;
> > +
> > +   dev = &rte_eventdevs[dev_id];
> > +   cdev = rte_cryptodev_pmd_get_dev(cdev_id);
> > +
> > +   if (caps == NULL)
> > +   return -EINVAL;
> > +   *caps = 0;
> > +
> > +   return dev->dev_ops->crypto_adapter_caps_get ?
> > +   (*dev->dev_ops->crypto_adapter_caps_get)
> > +   (dev, cdev, caps) : -ENOTSUP;
> > +}
> > +
> >  static inline int
> >  rte_event_dev_queue_config(struct rte_eventdev *dev, uint8_t
> > nb_queues)  { diff --git a/lib/librte_eventdev/rte_eventdev.h
> > b/lib/librte_eventdev/rte_eventdev.h
> > index 8297f24..9822747 100644
> > --- a/lib/librte_eventdev/rte_eventdev.h
> > +++ b/lib/librte_eventdev/rte_eventdev.h
> > @@ -8,6 +8,8 @@
> >  #ifndef _RTE_EVENTDEV_H_
> >  #define _RTE_EVENTDEV_H_
> >
> > +#include 
> > +
> >  /**
> >   * @file
> >   *
> > @@ -1135,6 +1137,56 @@ struct rte_event {  int __rte_experimental
> > rte_event_timer_adapter_caps_get(uint8_t dev_id, uint32_t *caps);
> >
> > +/* Crypto adapter capability bitmap flag */
> > +#define RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_NEW
> 0x1
> > +/**< Flag indicates HW is capable of generating events.
> 
> events in RTE_EVENT_OP_NEW enqueue operation
Ok
> 
> > + * Cryptodev will send packets to the event device as new events
> > + * using an internal event port.
> > + */
> > +
> > +#define RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD
> 0x2
> > +/**< Flag indicates HW is capable of generating events.
> 
> events in RTE_EVENT_OP_FWD enqueue operation
Ok
> 
> > + * Cryptodev will send packets to the event device as forwarded event
> > + * using an internal event port.
> > + */
> > +
> > +#define RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_QP_EV_BIND
> 0x4
> > +/**< Flag indicates HW is capable of mapping crypto queue pair to
> > + * event queue.
> > + */
> > +
> > +#define RTE_EVENT_CRYPTO_ADAPTER_CAP_SESSION_PRIVATE_DATA   0x8
> > +/**< Flag indicates HW/SW suports a mech

Re: [dpdk-dev] [v2, 3/6] eventdev: add crypto adapter implementation

2018-04-30 Thread Gujjar, Abhinandan S



> -Original Message-
> From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> Sent: Sunday, April 29, 2018 9:53 PM
> To: Gujjar, Abhinandan S 
> Cc: hemant.agra...@nxp.com; akhil.go...@nxp.com; dev@dpdk.org; Vangati,
> Narender ; Rao, Nikhil ;
> Eads, Gage 
> Subject: Re: [v2,3/6] eventdev: add crypto adapter implementation
> 
> -Original Message-
> > Date: Tue, 24 Apr 2018 18:13:24 +0530
> > From: Abhinandan Gujjar 
> > To: jerin.ja...@caviumnetworks.com, hemant.agra...@nxp.com,
> > akhil.go...@nxp.com, dev@dpdk.org
> > CC: narender.vang...@intel.com, abhinandan.guj...@intel.com,
> > nikhil@intel.com, gage.e...@intel.com
> > Subject: [v2,3/6] eventdev: add crypto adapter implementation
> > X-Mailer: git-send-email 1.9.1
> >
> > Signed-off-by: Abhinandan Gujjar 
> > Signed-off-by: Nikhil Rao 
> > Signed-off-by: Gage Eads 
> > ---
> > +
> > +/* Per crypto device information */
> > +struct crypto_device_info {
> > +   /* Pointer to cryptodev */
> > +   struct rte_cryptodev *dev;
> > +   /* Pointer to queue pair info */
> > +   struct crypto_queue_pair_info *qpairs;
> > +   /* Next queue pair to be processed */
> > +   uint16_t next_queue_pair_id;
> > +   /* Set to indicate cryptodev->eventdev packet
> > +* transfer uses a hardware mechanism
> > +*/
> > +   uint8_t internal_event_port;
> > +   /* Set to indicate processing has been started */
> > +   uint8_t dev_started;
> > +   /* If num_qpairs > 0, the start callback will
> > +* be invoked if not already invoked
> > +*/
> > +   uint16_t num_qpairs;
> > +};
> 
> Looks like it is used in fastpath, if so add the cache alignment.
Sure.
> 
> > +
> > +/* Per queue pair information */
> > +struct crypto_queue_pair_info {
> > +   /* Set to indicate queue pair is enabled */
> > +   bool qp_enabled;
> > +   /* Pointer to hold rte_crypto_ops for batching */
> > +   struct rte_crypto_op **op_buffer;
> > +   /* No of crypto ops accumulated */
> > +   uint8_t len;
> > +};
> > +
> > +static struct rte_event_crypto_adapter **event_crypto_adapter;
> > +
> > +eca_enq_to_cryptodev(struct rte_event_crypto_adapter *adapter,
> > +struct rte_event *ev, unsigned int cnt) {
> > +   struct rte_event_crypto_adapter_stats *stats = &adapter->crypto_stats;
> > +   union rte_event_crypto_metadata *m_data = NULL;
> > +   struct crypto_queue_pair_info *qp_info = NULL;
> > +   struct rte_crypto_op *crypto_op;
> > +   unsigned int i, n = 0;
> > +   uint16_t qp_id = 0, len = 0, ret = 0;
> 
> Please review the explicit '0' assignment.
I have initialized only those, which are complained by gcc.
I will look at it again. If required, I will initialize them separately. Is 
that ok?
> 
> > +   uint8_t cdev_id = 0;
> > +
> > +   stats->event_dequeue_count += cnt;
> > +
> > +   for (i = 0; i < cnt; i++) {
> > +   crypto_op = ev[i].event_ptr;
> > +   if (crypto_op == NULL)
> > +   continue;
> > +   if (crypto_op->sess_type == RTE_CRYPTO_OP_WITH_SESSION) {
> > +   m_data =
> rte_cryptodev_sym_session_get_private_data(
> > +   crypto_op->sym->session);
> > +   if (m_data == NULL) {
> > +   rte_pktmbuf_free(crypto_op->sym->m_src);
> > +   rte_crypto_op_free(crypto_op);
> > +   continue;
> > +   }
> > +
> > +   cdev_id = m_data->request_info.cdev_id;
> > +   qp_id = m_data->request_info.queue_pair_id;
> > +   qp_info = &adapter->cdevs[cdev_id].qpairs[qp_id];
> > +   if (qp_info == NULL) {
> > +   rte_pktmbuf_free(crypto_op->sym->m_src);
> > +   rte_crypto_op_free(crypto_op);
> > +   continue;
> > +   }
> > +   len = qp_info->len;
> > +   qp_info->op_buffer[len] = crypto_op;
> > +   len++;
> > +
> > +int __rte_experimental
> > +rte_event_crypto_adapter_queue_pair_add(uint8_t id,
> > +   uint8_t cdev_id,
> > +   int32_t queue_pair_id,
> > +   const struct rte_event_crypto_queue_pair_conf *conf)
> {
> > +   struct rte_event_crypto_adapter *adapter;
> > +   struct rte_eventdev *dev;
> > +   struct crypto_device_info *dev_info;
> > +   uint32_t cap;
> > +   int ret;
> > +
> > +   RTE_EVENT_CRYPTO_ADAPTER_ID_VALID_OR_ERR_RET(id, -EINVAL);
> > +
> > +   if (!rte_cryptodev_pmd_is_valid_dev(cdev_id)) {
> > +   RTE_EDEV_LOG_ERR("Invalid dev_id=%" PRIu8, cdev_id);
> > +   return -EINVAL;
> > +   }
> > +
> > +   adapter = eca_id_to_adapter(id);
> > +   if (adapter == NULL)
> > +   return -EINVAL;
> > +
> > +   dev = &rte_eventdevs[adapter->eventdev_id];
> > +   ret = rte_event_crypto_adapter_caps_get(adapter->eventdev_id,
> > +   cdev_id,
> > +

Re: [dpdk-dev] [v2, 5/6] eventdev: add event crypto adapter to meson build system

2018-04-30 Thread Gujjar, Abhinandan S



> -Original Message-
> From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> Sent: Sunday, April 29, 2018 9:55 PM
> To: Gujjar, Abhinandan S 
> Cc: hemant.agra...@nxp.com; akhil.go...@nxp.com; dev@dpdk.org; Vangati,
> Narender ; Rao, Nikhil ;
> Eads, Gage 
> Subject: Re: [v2,5/6] eventdev: add event crypto adapter to meson build system
> 
> -Original Message-
> > Date: Tue, 24 Apr 2018 18:13:26 +0530
> > From: Abhinandan Gujjar 
> > To: jerin.ja...@caviumnetworks.com, hemant.agra...@nxp.com,
> > akhil.go...@nxp.com, dev@dpdk.org
> > CC: narender.vang...@intel.com, abhinandan.guj...@intel.com,
> > nikhil@intel.com, gage.e...@intel.com
> > Subject: [v2,5/6] eventdev: add event crypto adapter to meson build
> > system
> > X-Mailer: git-send-email 1.9.1
> >
> > Signed-off-by: Abhinandan Gujjar 
> > ---
> >  lib/librte_eventdev/meson.build | 8 +---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_eventdev/meson.build
> > b/lib/librte_eventdev/meson.build
> 
> Separate patch is not required for meson build. Have it in the same patch for
> make based build and ADD each files as when it added in the patch.
Should I add changes related to " lib/librte_eventdev/meson.build" as part of 
crypto adapter implementation?
Or you recommend the changes in "eventdev pmd" patch?

[dpdk-dev] [PATCH v2 1/2] mem: check if allocation size is too big

2018-04-30 Thread Anatoly Burakov

Mapping size is a 64-bit integer, but mmap() will accept size_t for
size mappings. A user could request a mapping with an alignment, which
would have overflown size_t, so check if (size + alignment) will
overflow size_t.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_memory.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_memory.c 
b/lib/librte_eal/common/eal_common_memory.c
index 4c943b0..0ac7b33 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -75,8 +75,13 @@ eal_get_virtual_area(void *requested_addr, size_t *size,
 
do {
map_sz = no_align ? *size : *size + page_sz;
+   if (map_sz > SIZE_MAX) {
+   RTE_LOG(ERR, EAL, "Map size too big\n");
+   rte_errno = E2BIG;
+   return NULL;
+   }
 
-   mapped_addr = mmap(requested_addr, map_sz, PROT_READ,
+   mapped_addr = mmap(requested_addr, (size_t)map_sz, PROT_READ,
mmap_flags, -1, 0);
if (mapped_addr == MAP_FAILED && allow_shrink)
*size -= page_sz;
-- 
2.7.4

[dpdk-dev] [PATCH v2 2/2] mem: unmap unneeded space

2018-04-30 Thread Anatoly Burakov

When we ask to reserve virtual areas, we usually include
alignment in the mapping size, and that memory ends up
being wasted. Wasting a gigabyte of VA space while trying to
reserve one gigabyte is pretty expensive on 32-bit, so after
we're done mapping, unmap unneeded space.

Signed-off-by: Anatoly Burakov 
---

Notes:
v2:
- Split fix for size_t overflow into separate patch
- Improve readability of unmapping code
- Added comment explaining why unmapping is done

 lib/librte_eal/common/eal_common_memory.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_memory.c 
b/lib/librte_eal/common/eal_common_memory.c
index 0ac7b33..60aed4a 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -121,8 +121,32 @@ eal_get_virtual_area(void *requested_addr, size_t *size,
RTE_LOG(DEBUG, EAL, "Virtual area found at %p (size = 0x%zx)\n",
aligned_addr, *size);
 
-   if (unmap)
+   if (unmap) {
munmap(mapped_addr, map_sz);
+   } else if (!no_align) {
+   void *map_end, *aligned_end;
+   size_t before_len, after_len;
+
+   /* when we reserve space with alignment, we add alignment to
+* mapping size. On 32-bit, if 1GB alignment was requested, this
+* would waste 1GB of address space, which is a luxury we cannot
+* afford. so, if alignment was performed, check if any unneeded
+* address space can be unmapped back.
+*/
+
+   map_end = RTE_PTR_ADD(mapped_addr, (size_t)map_sz);
+   aligned_end = RTE_PTR_ADD(aligned_addr, *size);
+
+   /* unmap space before aligned mmap address */
+   before_len = RTE_PTR_DIFF(aligned_addr, mapped_addr);
+   if (before_len > 0)
+   munmap(mapped_addr, before_len);
+
+   /* unmap space after aligned end mmap address */
+   after_len = RTE_PTR_DIFF(map_end, aligned_end);
+   if (after_len > 0)
+   munmap(aligned_end, after_len);
+   }
 
baseaddr_offset += *size;
 
-- 
2.7.4

Re: [dpdk-dev] [v2, 5/6] eventdev: add event crypto adapter to meson build system

2018-04-30 Thread Jerin Jacob

-Original Message-
> Date: Mon, 30 Apr 2018 11:21:38 +
> From: "Gujjar, Abhinandan S" 
> To: Jerin Jacob 
> CC: "hemant.agra...@nxp.com" ,
>  "akhil.go...@nxp.com" , "dev@dpdk.org"
>  , "Vangati, Narender" , "Rao,
>  Nikhil" , "Eads, Gage" 
> Subject: RE: [v2,5/6] eventdev: add event crypto adapter to meson build
>  system
> 
> 
> 
> > -Original Message-
> > From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> > Sent: Sunday, April 29, 2018 9:55 PM
> > To: Gujjar, Abhinandan S 
> > Cc: hemant.agra...@nxp.com; akhil.go...@nxp.com; dev@dpdk.org; Vangati,
> > Narender ; Rao, Nikhil ;
> > Eads, Gage 
> > Subject: Re: [v2,5/6] eventdev: add event crypto adapter to meson build 
> > system
> > 
> > -Original Message-
> > > Date: Tue, 24 Apr 2018 18:13:26 +0530
> > > From: Abhinandan Gujjar 
> > > To: jerin.ja...@caviumnetworks.com, hemant.agra...@nxp.com,
> > > akhil.go...@nxp.com, dev@dpdk.org
> > > CC: narender.vang...@intel.com, abhinandan.guj...@intel.com,
> > > nikhil@intel.com, gage.e...@intel.com
> > > Subject: [v2,5/6] eventdev: add event crypto adapter to meson build
> > > system
> > > X-Mailer: git-send-email 1.9.1
> > >
> > > Signed-off-by: Abhinandan Gujjar 
> > > ---
> > >  lib/librte_eventdev/meson.build | 8 +---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/lib/librte_eventdev/meson.build
> > > b/lib/librte_eventdev/meson.build
> > 
> > Separate patch is not required for meson build. Have it in the same patch 
> > for
> > make based build and ADD each files as when it added in the patch.
> Should I add changes related to " lib/librte_eventdev/meson.build" as part of 
> crypto adapter implementation?
> Or you recommend the changes in "eventdev pmd" patch?

IMO, You can add in second patch where your implementation gets added.
Both make based and meson based build enablement you can add it in that
patch.


>

Re: [dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles

2018-04-30 Thread Burakov, Anatoly


On 28-Apr-18 10:38 AM, Andrew Rybchenko wrote:

On 04/25/2018 01:36 PM, Anatoly Burakov wrote:

The original implementation used flock() locks, but was later
switched to using fcntl() locks for page locking, because
fcntl() locks allow locking parts of a file, which is useful
for single-file segments mode, where locking the entire file
isn't as useful because we still need to grow and shrink it.

However, according to fcntl()'s Ubuntu manpage [1], semantics of
fcntl() locks have a giant oversight:

   This interface follows the completely stupid semantics of System
   V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all
   locks associated with a file for a given process are removed
   when any file descriptor for that file is closed by that process.
   This semantic means that applications must be aware of any files
   that a subroutine library may access.

Basically, closing *any* fd with an fcntl() lock (which we do because
we don't want to leak fd's) will drop the lock completely.

So, in this commit, we will be reverting back to using flock() locks
everywhere. However, that still leaves the problem of locking parts
of a memseg list file in single file segments mode, and we will be
solving it with creating separate lock files per each page, and
tracking those with flock().

We will also be removing all of this tailq business and replacing it
with a simple array - saving a few bytes is not worth the extra
hassle of dealing with pointers and potential memory allocation
failures. Also, remove the tailq lock since it is not needed - these
fd lists are per-process, and within a given process, it is always
only one thread handling access to hugetlbfs.

So, first one to allocate a segment will create a lockfile, and put
a shared lock on it. When we're shrinking the page file, we will be
trying to take out a write lock on that lockfile, which would fail if
any other process is holding onto the lockfile as well. This way, we
can know if we can shrink the segment file. Also, if no other locks
are found in the lock list for a given memseg list, the memseg list
fd is automatically closed.

One other thing to note is, according to flock() Ubuntu manpage [2],
upgrading the lock from shared to exclusive is implemented by dropping
and reacquiring the lock, which is not atomic and thus would have
created race conditions. So, on attempting to perform operations in
hugetlbfs, we will take out a writelock on hugetlbfs directory, so
that only one process could perform hugetlbfs operations concurrently.

[1] 
http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html

[2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html

Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime")
Fixes: 2a04139f66b4 ("eal: add single file segments option")
Cc: anatoly.bura...@intel.com

Signed-off-by: Anatoly Burakov 
Acked-by: Bruce Richardson 


We have a problem with the changeset if EAL option -m or --socket-mem is 
used.

EAL initialization hangs just after EAL: Probing VFIO support...
strace points to flock(7, LOCK_EX
List of file descriptors:
# ls /proc/25452/fd -l
total 0
lrwx-- 1 root root 64 Apr 28 10:34 0 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:34 1 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:32 2 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:34 3 -> /run/.rte_config
lrwx-- 1 root root 64 Apr 28 10:34 4 -> socket:[154166]
lrwx-- 1 root root 64 Apr 28 10:34 5 -> socket:[154158]
lr-x-- 1 root root 64 Apr 28 10:34 6 -> /dev/hugepages
lr-x-- 1 root root 64 Apr 28 10:34 7 -> /dev/hugepages

I guess the problem is that there are two /dev/hugepages and
it hangs on the second.

Ideas how to solve it?

Andrew.



Hi Andrew,

Please try the following patch:

http://dpdk.org/dev/patchwork/patch/39166/

This should fix the issue.

--
Thanks,
Anatoly

Re: [dpdk-dev] [v2, 6/6] doc: add event crypto adapter documentation

2018-04-30 Thread Gujjar, Abhinandan S



> -Original Message-
> From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> Sent: Sunday, April 29, 2018 10:01 PM
> To: Gujjar, Abhinandan S 
> Cc: hemant.agra...@nxp.com; akhil.go...@nxp.com; dev@dpdk.org; Vangati,
> Narender ; Rao, Nikhil ;
> Eads, Gage 
> Subject: Re: [v2,6/6] doc: add event crypto adapter documentation
> 
> -Original Message-
> > Date: Tue, 24 Apr 2018 18:13:27 +0530
> > From: Abhinandan Gujjar 
> > To: jerin.ja...@caviumnetworks.com, hemant.agra...@nxp.com,
> > akhil.go...@nxp.com, dev@dpdk.org
> > CC: narender.vang...@intel.com, abhinandan.guj...@intel.com,
> > nikhil@intel.com, gage.e...@intel.com
> > Subject: [v2,6/6] doc: add event crypto adapter documentation
> > X-Mailer: git-send-email 1.9.1
> >
> > Add entries in the programmer's guide, API index, maintainer's file
> > and release notes for the event crypto adapter.
> >
> > Signed-off-by: Abhinandan Gujjar 
> > ---
> > +
> > +The packet flow from cryptodev to the event device can be
> > +accomplished using both SW and HW based transfer mechanisms.
> > +The Adapter queries an eventdev PMD to determine which mechanism to be
> used.
> > +The adapter uses an EAL service core function for SW based packet
> > +transfer and uses the eventdev PMD functions to configure HW based
> > +packet transfer between the cryptodev and the event device.
> > +
> > +Crypto adapter uses a new event type called
> > +``RTE_EVENT_TYPE_CRYPTODEV`` to indicate the event source.
> > +
> 
> I think, we can add diagrams used in rte_event_crypto_adapter.h with sequence
> number in SVG format here to make it easier to understand for the end user.
Sure
> 
> 
> > +API Overview
> > +
> > +
> > +This section has a brief introduction to the event crypto adapter APIs.
> > +The application is expected to create an adapter which is associated
> > +with a single eventdev, then add cryptodev and queue pair to the adapter
> instance.
> > +
> > +Adapter can be started in ``RTE_EVENT_CRYPTO_ADAPTER_DEQ_ONLY`` or
> > +``RTE_EVENT_CRYPTO_ADAPTER_ENQ_DEQ`` mode.
> > +In first mode, application will submit a crypto operation directly to 
> > cryptodev.
> > +In the second mode, application will send a crypto ops to cryptodev
> > +adapter via eventdev. The cryptodev adapter then submits the crypto
> > +operation to the crypto device.
> > +
> > +Create an adapter instance
> > +--
> > +
> > +An adapter instance is created using
> > +``rte_event_crypto_adapter_create()``. This function is called with
> > +event device to be associated with the adapter and port configuration
> > +for the adapter to setup an event port(if the adapter needs to use a 
> > service
> function).
> > +
> > +.. code-block:: c
> > +
> > +int err;
> > +uint8_t dev_id, id;
> > +struct rte_event_dev_info dev_info;
> > +struct rte_event_port_conf conf;
> > +   enum rte_event_crypto_adapter_mode mode;
> > +
> > +err = rte_event_dev_info_get(id, &dev_info);
> > +
> > +conf.new_event_threshold = dev_info.max_num_events;
> > +conf.dequeue_depth = dev_info.max_event_port_dequeue_depth;
> > +conf.enqueue_depth = dev_info.max_event_port_enqueue_depth;
> > +   mode = RTE_EVENT_CRYPTO_ADAPTER_ENQ_DEQ;
> > +err = rte_event_crypto_adapter_create(id, dev_id, &conf,
> > +mode);
> > +
> > +If the application desires to have finer control of eventdev port
> > +allocation and setup, it can use the
> ``rte_event_crypto_adapter_create_ext()`` function.
> > +The ``rte_event_crypto_adapter_create_ext()`` function is passed as a
> > +callback function. The callback function is invoked if the adapter
> > +needs to use a service function and needs to create an event port for
> > +it. The callback is expected to fill the ``struct
> > +rte_event_crypto_adapter_conf`` structure passed to it.
> > +
> > +For ENQ-DEQ mode, the event port created by adapter can be retrived
> > +using
> 
> s/retrived/retrieved ?
Ok
> 
> > +``rte_event_crypto_adapter_event_port_get()`` API.
> > +Application can use this event port to link with event queue on which
> > +it enqueue events towards the crypto adapter.
> > +
> > +.. code-block:: c
> > +
> > +   uint8_t id, evdev, crypto_ev_port_id, app_qid;
> > +   struct rte_event ev;
> > +   int ret;
> > +
> > +   ret = rte_event_crypto_adapter_event_port_get(id,
> &crypto_ev_port_id);
> > +   ret = rte_event_queue_setup(evdev, app_qid, NULL);
> > +   ret = rte_event_port_link(evdev, crypto_ev_port_id, &app_qid, NULL,
> > +1);
> > +
> > +   /* Fill in event info and update event_ptr with rte_crypto_op */
> > +   memset(&ev, 0, sizeof(ev));
> > +   ev.queue_id = app_qid;
> > +   .
> > +   .
> > +   ev.event_ptr = op;
> > +   ret = rte_event_enqueue_burst(evdev, app_ev_port_id, ev, nb_events);
> > +
> > +
> > +Adding queue pair to the adapter instance
> > +-
> > +
> > +Cryptodev device id and queue pair are created using cryptodev APIs.
> >

Re: [dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles

2018-04-30 Thread Maxime Coquelin




On 04/30/2018 01:31 PM, Burakov, Anatoly wrote:

On 28-Apr-18 10:38 AM, Andrew Rybchenko wrote:

On 04/25/2018 01:36 PM, Anatoly Burakov wrote:

The original implementation used flock() locks, but was later
switched to using fcntl() locks for page locking, because
fcntl() locks allow locking parts of a file, which is useful
for single-file segments mode, where locking the entire file
isn't as useful because we still need to grow and shrink it.

However, according to fcntl()'s Ubuntu manpage [1], semantics of
fcntl() locks have a giant oversight:

   This interface follows the completely stupid semantics of System
   V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all
   locks associated with a file for a given process are removed
   when any file descriptor for that file is closed by that process.
   This semantic means that applications must be aware of any files
   that a subroutine library may access.

Basically, closing *any* fd with an fcntl() lock (which we do because
we don't want to leak fd's) will drop the lock completely.

So, in this commit, we will be reverting back to using flock() locks
everywhere. However, that still leaves the problem of locking parts
of a memseg list file in single file segments mode, and we will be
solving it with creating separate lock files per each page, and
tracking those with flock().

We will also be removing all of this tailq business and replacing it
with a simple array - saving a few bytes is not worth the extra
hassle of dealing with pointers and potential memory allocation
failures. Also, remove the tailq lock since it is not needed - these
fd lists are per-process, and within a given process, it is always
only one thread handling access to hugetlbfs.

So, first one to allocate a segment will create a lockfile, and put
a shared lock on it. When we're shrinking the page file, we will be
trying to take out a write lock on that lockfile, which would fail if
any other process is holding onto the lockfile as well. This way, we
can know if we can shrink the segment file. Also, if no other locks
are found in the lock list for a given memseg list, the memseg list
fd is automatically closed.

One other thing to note is, according to flock() Ubuntu manpage [2],
upgrading the lock from shared to exclusive is implemented by dropping
and reacquiring the lock, which is not atomic and thus would have
created race conditions. So, on attempting to perform operations in
hugetlbfs, we will take out a writelock on hugetlbfs directory, so
that only one process could perform hugetlbfs operations concurrently.

[1] 
http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html

[2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html

Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime")
Fixes: 2a04139f66b4 ("eal: add single file segments option")
Cc: anatoly.bura...@intel.com

Signed-off-by: Anatoly Burakov 
Acked-by: Bruce Richardson 


We have a problem with the changeset if EAL option -m or --socket-mem 
is used.

EAL initialization hangs just after EAL: Probing VFIO support...
strace points to flock(7, LOCK_EX
List of file descriptors:
# ls /proc/25452/fd -l
total 0
lrwx-- 1 root root 64 Apr 28 10:34 0 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:34 1 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:32 2 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:34 3 -> /run/.rte_config
lrwx-- 1 root root 64 Apr 28 10:34 4 -> socket:[154166]
lrwx-- 1 root root 64 Apr 28 10:34 5 -> socket:[154158]
lr-x-- 1 root root 64 Apr 28 10:34 6 -> /dev/hugepages
lr-x-- 1 root root 64 Apr 28 10:34 7 -> /dev/hugepages

I guess the problem is that there are two /dev/hugepages and
it hangs on the second.

Ideas how to solve it?

Andrew.



Hi Andrew,

Please try the following patch:

http://dpdk.org/dev/patchwork/patch/39166/

This should fix the issue.



I faced the regression in my test bench, your patch fixes the issue in
my case:

Tested-by: Maxime Coquelin 

Thanks,
Maxime

[dpdk-dev] [PATCH v2 0/4] Clean up EAL runtime data paths

2018-04-30 Thread Anatoly Burakov

As has been suggested [1], all DPDK runtime paths should be put
into a single place. This patchset accomplishes exactly that.

If running as root, all files will be put under /var/run/dpdk/,
otherwise they will be put under $XDG_RUNTIME_PATH/dpdk/, or, if
that environment variable is not defined, all files will go under
/tmp/dpdk/.

[1] http://dpdk.org/dev/patchwork/patch/38688/

v2:
- Rebase on rc1

Anatoly Burakov (4):
  eal: remove unused define
  eal: rename function returning hugepage data path
  eal: add directory for DPDK runtime data
  eal: move all runtime data into DPDK runtime dir

 lib/librte_eal/bsdapp/eal/eal.c  | 70 +++
 lib/librte_eal/common/eal_filesystem.h   | 81 ++--
 lib/librte_eal/linuxapp/eal/eal.c| 69 +++
 lib/librte_eal/linuxapp/eal/eal_memory.c | 10 ++--
 4 files changed, 171 insertions(+), 59 deletions(-)

-- 
2.7.4

[dpdk-dev] [PATCH v2 3/4] eal: add directory for DPDK runtime data

2018-04-30 Thread Anatoly Burakov

Currently, during runtime, DPDK will store a bunch of files here
and there (in /var/run, /tmp or in $HOME). Fix it by creating a
DPDK-specific runtime directory, under which all runtime data
will be placed. The template for creating this runtime directory
is the following:

  /dpdk//

Where  is set to either "/var/run" if run as root, or
$XDG_RUNTIME_DIR if run as non-root, with a fallback to /tmp if
$XDG_RUNTIME_DIR is not defined. So, for example, if run as root,
by default all runtime data will be stored at /var/run/dpdk/rte/.

There is no equivalent of "mkdir -p", so we will be creating the
path step by step.

Nothing uses this new path yet, changes for that will come in
next commit.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/bsdapp/eal/eal.c| 68 ++
 lib/librte_eal/common/eal_filesystem.h |  8 
 lib/librte_eal/linuxapp/eal/eal.c  | 67 +
 3 files changed, 143 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index a63f11f..256ab2d 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -83,6 +84,66 @@ struct internal_config internal_config;
 /* used by rte_rdtsc() */
 int rte_cycles_vmware_tsc_map;
 
+/* platform-specific runtime dir */
+static char runtime_dir[PATH_MAX];
+
+int
+eal_create_runtime_dir(void)
+{
+   const char *directory = default_config_dir;
+   const char *xdg_runtime_dir = getenv("XDG_RUNTIME_DIR");
+   const char *fallback = "/tmp";
+   char tmp[PATH_MAX];
+   int ret;
+
+   if (getuid() != 0) {
+   /* try XDG path first, fall back to /tmp */
+   if (xdg_runtime_dir != NULL)
+   directory = xdg_runtime_dir;
+   else
+   directory = fallback;
+   }
+   /* create DPDK subdirectory under runtime dir */
+   ret = snprintf(tmp, sizeof(tmp), "%s/dpdk", directory);
+   if (ret < 0 || ret == sizeof(tmp)) {
+   RTE_LOG(ERR, EAL, "Error creating DPDK runtime path name\n");
+   return -1;
+   }
+
+   /* create prefix-specific subdirectory under DPDK runtime dir */
+   ret = snprintf(runtime_dir, sizeof(runtime_dir), "%s/%s",
+   tmp, internal_config.hugefile_prefix);
+   if (ret < 0 || ret == sizeof(runtime_dir)) {
+   RTE_LOG(ERR, EAL, "Error creating prefix-specific runtime path 
name\n");
+   return -1;
+   }
+
+   /* create the path if it doesn't exist. no "mkdir -p" here, so do it
+* step by step.
+*/
+   ret = mkdir(tmp, 0600);
+   if (ret < 0 && errno != EEXIST) {
+   RTE_LOG(ERR, EAL, "Error creating '%s': %s\n",
+   tmp, strerror(errno));
+   return -1;
+   }
+
+   ret = mkdir(runtime_dir, 0600);
+   if (ret < 0 && errno != EEXIST) {
+   RTE_LOG(ERR, EAL, "Error creating '%s': %s\n",
+   runtime_dir, strerror(errno));
+   return -1;
+   }
+
+   return 0;
+}
+
+const char *
+eal_get_runtime_dir(void)
+{
+   return runtime_dir;
+}
+
 /* Return user provided mbuf pool ops name */
 const char * __rte_experimental
 rte_eal_mbuf_user_pool_ops(void)
@@ -522,6 +583,13 @@ rte_eal_init(int argc, char **argv)
/* set log level as early as possible */
eal_log_level_parse(argc, argv);
 
+   /* create runtime data directory */
+   if (eal_create_runtime_dir() < 0) {
+   rte_eal_init_alert("Cannot create runtime directory\n");
+   rte_errno = EACCES;
+   return -1;
+   }
+
if (rte_eal_cpu_init() < 0) {
rte_eal_init_alert("Cannot detect lcores.");
rte_errno = ENOTSUP;
diff --git a/lib/librte_eal/common/eal_filesystem.h 
b/lib/librte_eal/common/eal_filesystem.h
index 060ac2b..67f5ca8 100644
--- a/lib/librte_eal/common/eal_filesystem.h
+++ b/lib/librte_eal/common/eal_filesystem.h
@@ -25,6 +25,14 @@
 
 static const char *default_config_dir = "/var/run";
 
+/* sets up platform-specific runtime data dir */
+int
+eal_create_runtime_dir(void);
+
+/* returns runtime dir */
+const char *
+eal_get_runtime_dir(void);
+
 static inline const char *
 eal_runtime_config_path(void)
 {
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index e2c0bd6..053b7e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -92,6 +92,66 @@ struct internal_config internal_config;
 /* used by rte_rdtsc() */
 int rte_cycles_vmware_tsc_map;
 
+/* platform-specific runtime dir */
+static char runtime_dir[PATH_MAX];
+
+int
+eal_create_runtime_dir(void)
+{
+   const char *directory = default_config_dir;
+   const char *xdg_runtime_dir = getenv("XDG_RUNTIME_DIR");
+   const c

[dpdk-dev] [PATCH v2 2/4] eal: rename function returning hugepage data path

2018-04-30 Thread Anatoly Burakov

The original name for this path was not too descriptive and
confusing. Rename it to a more appropriate and descriptive name:
it stores data about hugepages, so name it eal_hugepage_data_path().

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_filesystem.h   |  2 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c | 10 ++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_filesystem.h 
b/lib/librte_eal/common/eal_filesystem.h
index 078e2eb..060ac2b 100644
--- a/lib/librte_eal/common/eal_filesystem.h
+++ b/lib/librte_eal/common/eal_filesystem.h
@@ -89,7 +89,7 @@ eal_hugepage_info_path(void)
 #define HUGEPAGE_FILE_FMT "%s/.%s_hugepage_file"
 
 static inline const char *
-eal_hugepage_file_path(void)
+eal_hugepage_data_path(void)
 {
static char buffer[PATH_MAX]; /* static so auto-zeroed */
const char *directory = default_config_dir;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e0baabb..c917de1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1499,7 +1499,7 @@ eal_legacy_hugepage_init(void)
}
 
/* create shared memory */
-   hugepage = create_shared_memory(eal_hugepage_file_path(),
+   hugepage = create_shared_memory(eal_hugepage_data_path(),
nr_hugefiles * sizeof(struct hugepage_file));
 
if (hugepage == NULL) {
@@ -1727,16 +1727,18 @@ eal_legacy_hugepage_attach(void)
 
test_phys_addrs_available();
 
-   fd_hugepage = open(eal_hugepage_file_path(), O_RDONLY);
+   fd_hugepage = open(eal_hugepage_data_path(), O_RDONLY);
if (fd_hugepage < 0) {
-   RTE_LOG(ERR, EAL, "Could not open %s\n", 
eal_hugepage_file_path());
+   RTE_LOG(ERR, EAL, "Could not open %s\n",
+   eal_hugepage_data_path());
goto error;
}
 
size = getFileSize(fd_hugepage);
hp = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd_hugepage, 0);
if (hp == MAP_FAILED) {
-   RTE_LOG(ERR, EAL, "Could not mmap %s\n", 
eal_hugepage_file_path());
+   RTE_LOG(ERR, EAL, "Could not mmap %s\n",
+   eal_hugepage_data_path());
goto error;
}
 
-- 
2.7.4

[dpdk-dev] [PATCH v2 4/4] eal: move all runtime data into DPDK runtime dir

2018-04-30 Thread Anatoly Burakov

Fix all calls to functions in eal_filesystem to produce paths
residing inside dedicated DPDK runtime directory.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/bsdapp/eal/eal.c|  2 +
 lib/librte_eal/common/eal_filesystem.h | 71 +-
 lib/librte_eal/linuxapp/eal/eal.c  |  2 +
 3 files changed, 22 insertions(+), 53 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 256ab2d..ebda2ef 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -87,6 +87,8 @@ int rte_cycles_vmware_tsc_map;
 /* platform-specific runtime dir */
 static char runtime_dir[PATH_MAX];
 
+static const char *default_config_dir = "/var/run";
+
 int
 eal_create_runtime_dir(void)
 {
diff --git a/lib/librte_eal/common/eal_filesystem.h 
b/lib/librte_eal/common/eal_filesystem.h
index 67f5ca8..c98102f 100644
--- a/lib/librte_eal/common/eal_filesystem.h
+++ b/lib/librte_eal/common/eal_filesystem.h
@@ -11,10 +11,6 @@
 #ifndef EAL_FILESYSTEM_H
 #define EAL_FILESYSTEM_H
 
-/** Path of rte config file. */
-#define RUNTIME_CONFIG_FMT "%s/.%s_config"
-#define FBARRAY_FMT "%s/.%s_%s"
-
 #include 
 #include 
 #include 
@@ -23,8 +19,6 @@
 #include 
 #include "eal_internal_cfg.h"
 
-static const char *default_config_dir = "/var/run";
-
 /* sets up platform-specific runtime data dir */
 int
 eal_create_runtime_dir(void);
@@ -33,80 +27,57 @@ eal_create_runtime_dir(void);
 const char *
 eal_get_runtime_dir(void);
 
+#define RUNTIME_CONFIG_FNAME "config"
 static inline const char *
 eal_runtime_config_path(void)
 {
static char buffer[PATH_MAX]; /* static so auto-zeroed */
-   const char *directory = default_config_dir;
-   const char *home_dir = getenv("HOME");
 
-   if (getuid() != 0 && home_dir != NULL)
-   directory = home_dir;
-   snprintf(buffer, sizeof(buffer) - 1, RUNTIME_CONFIG_FMT, directory,
-   internal_config.hugefile_prefix);
+   snprintf(buffer, sizeof(buffer) - 1, "%s/%s", eal_get_runtime_dir(),
+   RUNTIME_CONFIG_FNAME);
return buffer;
 }
 
 /** Path of primary/secondary communication unix socket file. */
-#define MP_SOCKET_PATH_FMT "%s/.%s_unix"
+#define MP_SOCKET_FNAME "mp_socket"
 static inline const char *
 eal_mp_socket_path(void)
 {
static char buffer[PATH_MAX]; /* static so auto-zeroed */
-   const char *directory = default_config_dir;
-   const char *home_dir = getenv("HOME");
-
-   if (getuid() != 0 && home_dir != NULL)
-   directory = home_dir;
-   snprintf(buffer, sizeof(buffer) - 1, MP_SOCKET_PATH_FMT,
-directory, internal_config.hugefile_prefix);
 
+   snprintf(buffer, sizeof(buffer) - 1, "%s/%s", eal_get_runtime_dir(),
+   MP_SOCKET_FNAME);
return buffer;
 }
 
+#define FBARRAY_NAME_FMT "%s/fbarray_%s"
 static inline const char *
 eal_get_fbarray_path(char *buffer, size_t buflen, const char *name) {
-   const char *directory = "/tmp";
-   const char *home_dir = getenv("HOME");
-
-   if (getuid() != 0 && home_dir != NULL)
-   directory = home_dir;
-   snprintf(buffer, buflen - 1, FBARRAY_FMT, directory,
-   internal_config.hugefile_prefix, name);
+   snprintf(buffer, buflen, FBARRAY_NAME_FMT, eal_get_runtime_dir(), name);
return buffer;
 }
 
 /** Path of hugepage info file. */
-#define HUGEPAGE_INFO_FMT "%s/.%s_hugepage_info"
-
+#define HUGEPAGE_INFO_FNAME "hugepage_info"
 static inline const char *
 eal_hugepage_info_path(void)
 {
static char buffer[PATH_MAX]; /* static so auto-zeroed */
-   const char *directory = default_config_dir;
-   const char *home_dir = getenv("HOME");
 
-   if (getuid() != 0 && home_dir != NULL)
-   directory = home_dir;
-   snprintf(buffer, sizeof(buffer) - 1, HUGEPAGE_INFO_FMT, directory,
-   internal_config.hugefile_prefix);
+   snprintf(buffer, sizeof(buffer) - 1, "%s/%s", eal_get_runtime_dir(),
+   HUGEPAGE_INFO_FNAME);
return buffer;
 }
 
-/** Path of hugepage info file. */
-#define HUGEPAGE_FILE_FMT "%s/.%s_hugepage_file"
-
+/** Path of hugepage data file. */
+#define HUGEPAGE_DATA_FNAME "hugepage_data"
 static inline const char *
 eal_hugepage_data_path(void)
 {
static char buffer[PATH_MAX]; /* static so auto-zeroed */
-   const char *directory = default_config_dir;
-   const char *home_dir = getenv("HOME");
 
-   if (getuid() != 0 && home_dir != NULL)
-   directory = home_dir;
-   snprintf(buffer, sizeof(buffer) - 1, HUGEPAGE_FILE_FMT, directory,
-   internal_config.hugefile_prefix);
+   snprintf(buffer, sizeof(buffer) - 1, "%s/%s", eal_get_runtime_dir(),
+   HUGEPAGE_DATA_FNAME);
return buffer;
 }
 
@@ -122,18 +93,12 @@ eal_get_hugefile_path(char *buffer, size_t buflen, const

[dpdk-dev] [PATCH v2 1/4] eal: remove unused define

2018-04-30 Thread Anatoly Burakov

The define was a leftover from IVSHMEM library.

Fixes: c711ccb30987 ("ivshmem: remove library and its EAL integration")
Cc: david.march...@6wind.com
Cc: sta...@dpdk.org

Signed-off-by: Anatoly Burakov 
Reviewed-by: David Marchand 
---
 lib/librte_eal/common/eal_filesystem.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_filesystem.h 
b/lib/librte_eal/common/eal_filesystem.h
index 4db5c10..078e2eb 100644
--- a/lib/librte_eal/common/eal_filesystem.h
+++ b/lib/librte_eal/common/eal_filesystem.h
@@ -104,8 +104,6 @@ eal_hugepage_file_path(void)
 
 /** String format for hugepage map files. */
 #define HUGEFILE_FMT "%s/%smap_%d"
-#define TEMP_HUGEFILE_FMT "%s/%smap_temp_%d"
-
 static inline const char *
 eal_get_hugefile_path(char *buffer, size_t buflen, const char *hugedir, int 
f_id)
 {
-- 
2.7.4

[dpdk-dev] [PATCH v1] net/mlx4: fix CRC stripping capability report

2018-04-30 Thread Ophir Munk

There are two capabilities related to CRC stripping:
1. mlx4 HW capability to perform CRC stripping on a recieved packet.
This capability is built in mlx4 HW. It should be returned by the API
call mlx4_get_rx_queue_offloads().
2. mlx4 driver capability to enable/disable HW CRC stripping. This
capability is dependent on the driver version.
Before this commit the seccond capability was falsely returned by
the mentioned API. This commit fixes it by returning the first
capability.

Fixes: de1df14e6e6ec ("net/mlx4: support CRC strip toggling")
Cc: sta...@dpdk.org

Signed-off-by: Ophir Munk 
---
 drivers/net/mlx4/mlx4_rxq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index b430678..88e5912 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -658,10 +658,9 @@ mlx4_rxq_detach(struct rxq *rxq)
 uint64_t
 mlx4_get_rx_queue_offloads(struct priv *priv)
 {
-   uint64_t offloads = DEV_RX_OFFLOAD_SCATTER;
+   uint64_t offloads = DEV_RX_OFFLOAD_SCATTER |
+   DEV_RX_OFFLOAD_CRC_STRIP;
 
-   if (priv->hw_fcs_strip)
-   offloads |= DEV_RX_OFFLOAD_CRC_STRIP;
if (priv->hw_csum)
offloads |= DEV_RX_OFFLOAD_CHECKSUM;
return offloads;
-- 
2.7.4

Re: [dpdk-dev] [PATCH 5/8 v4] raw/dpaa2_qdma: introduce the DPAA2 QDMA driver

2018-04-30 Thread Thomas Monjalon

24/04/2018 13:49, Nipun Gupta:
>  drivers/raw/dpaa2_qdma/dpaa2_qdma.c| 294 
> +
>  drivers/raw/dpaa2_qdma/dpaa2_qdma.h|  66 +
>  drivers/raw/dpaa2_qdma/dpaa2_qdma_logs.h   |  46 
[...]
> +install_headers('rte_pmd_dpaa2_qdma.h')

I think you need to rename the exported header file with rte_pmd_ prefix.

Re: [dpdk-dev] [PATCH] eal: check if hugedir write lock is already being held

2018-04-30 Thread Shahaf Shuler

Monday, April 30, 2018 1:38 PM, Anatoly Burakov:
> Cc: arybche...@solarflare.com; anatoly.bura...@intel.com
> Subject: [dpdk-dev] [PATCH] eal: check if hugedir write lock is already being
> held
> 
> At hugepage info initialization, EAL takes out a write lock on hugetlbfs
> directories, and drops it after the memory init is finished. However, in non-
> legacy mode, if "-m" or "--socket-mem"
> switches are passed, this leads to a deadlock because EAL tries to allocate
> pages (and thus take out a write lock on hugedir) while still holding a
> separate hugedir write lock in EAL.
> 
> Fix it by checking if write lock in hugepage info is active, and not trying 
> to lock
> the directory if the hugedir fd is valid.
> 
> Fixes: 1a7dc2252f28 ("mem: revert to using flock and add per-segment
> lockfiles")
> Cc: anatoly.bura...@intel.com
> 
> Signed-off-by: Anatoly Burakov 
> ---
>  lib/librte_eal/linuxapp/eal/eal_memalloc.c | 71 ++--
> --
>  1 file changed, 42 insertions(+), 29 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
> b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
> index 00d7886..360d8f7 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
> @@ -666,7 +666,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl,
> void *arg)
>   struct alloc_walk_param *wa = arg;
>   struct rte_memseg_list *cur_msl;
>   size_t page_sz;
> - int cur_idx, start_idx, j, dir_fd;
> + int cur_idx, start_idx, j, dir_fd = -1;
>   unsigned int msl_idx, need, i;
> 
>   if (msl->page_sz != wa->page_sz)
> @@ -691,19 +691,24 @@ alloc_seg_walk(const struct rte_memseg_list *msl,
> void *arg)
>* because file creation and locking operations are not atomic,
>* and we might be the first or the last ones to use a particular page,
>* so we need to ensure atomicity of every operation.
> +  *
> +  * during init, we already hold a write lock, so don't try to take out
> +  * another one.
>*/
> - dir_fd = open(wa->hi->hugedir, O_RDONLY);
> - if (dir_fd < 0) {
> - RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
> __func__,
> - wa->hi->hugedir, strerror(errno));
> - return -1;
> - }
> - /* blocking writelock */
> - if (flock(dir_fd, LOCK_EX)) {
> - RTE_LOG(ERR, EAL, "%s(): Cannot lock '%s': %s\n", __func__,
> - wa->hi->hugedir, strerror(errno));
> - close(dir_fd);
> - return -1;
> + if (wa->hi->lock_descriptor == -1) {
> + dir_fd = open(wa->hi->hugedir, O_RDONLY);
> + if (dir_fd < 0) {
> + RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
> + __func__, wa->hi->hugedir, strerror(errno));
> + return -1;
> + }
> + /* blocking writelock */
> + if (flock(dir_fd, LOCK_EX)) {
> + RTE_LOG(ERR, EAL, "%s(): Cannot lock '%s': %s\n",
> + __func__, wa->hi->hugedir, strerror(errno));
> + close(dir_fd);
> + return -1;
> + }
>   }
> 
>   for (i = 0; i < need; i++, cur_idx++) { @@ -742,7 +747,8 @@
> alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
>   if (wa->ms)
>   memset(wa->ms, 0, sizeof(*wa->ms) * wa-
> >n_segs);
> 
> - close(dir_fd);
> + if (dir_fd >= 0)
> + close(dir_fd);
>   return -1;
>   }
>   if (wa->ms)
> @@ -754,7 +760,8 @@ alloc_seg_walk(const struct rte_memseg_list *msl,
> void *arg)
>   wa->segs_allocated = i;
>   if (i > 0)
>   cur_msl->version++;
> - close(dir_fd);
> + if (dir_fd >= 0)
> + close(dir_fd);
>   return 1;
>  }
> 
> @@ -769,7 +776,7 @@ free_seg_walk(const struct rte_memseg_list *msl,
> void *arg)
>   struct rte_memseg_list *found_msl;
>   struct free_walk_param *wa = arg;
>   uintptr_t start_addr, end_addr;
> - int msl_idx, seg_idx, ret, dir_fd;
> + int msl_idx, seg_idx, ret, dir_fd = -1;
> 
>   start_addr = (uintptr_t) msl->base_va;
>   end_addr = start_addr + msl->memseg_arr.len * (size_t)msl-
> >page_sz; @@ -788,19 +795,24 @@ free_seg_walk(const struct
> rte_memseg_list *msl, void *arg)
>* because file creation and locking operations are not atomic,
>* and we might be the first or the last ones to use a particular page,
>* so we need to ensure atomicity of every operation.
> +  *
> +  * during init, we already hold a write lock, so don't try to take out
> +  * another one.
>*/
> - dir_fd = open(wa->hi->hugedir, O_RDONLY);
> - if (dir_fd < 0) {
> - RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
> __func__

Re: [dpdk-dev] [PATCH v2 1/2] mem: check if allocation size is too big

2018-04-30 Thread Bruce Richardson

On Mon, Apr 30, 2018 at 12:21:42PM +0100, Anatoly Burakov wrote:
> Mapping size is a 64-bit integer, but mmap() will accept size_t for
> size mappings. A user could request a mapping with an alignment, which
> would have overflown size_t, so check if (size + alignment) will
> overflow size_t.
> 
> Signed-off-by: Anatoly Burakov 
> ---
Acked-by: Bruce Richardson

Re: [dpdk-dev] [PATCH v2 2/2] mem: unmap unneeded space

2018-04-30 Thread Bruce Richardson

On Mon, Apr 30, 2018 at 12:21:43PM +0100, Anatoly Burakov wrote:
> When we ask to reserve virtual areas, we usually include
> alignment in the mapping size, and that memory ends up
> being wasted. Wasting a gigabyte of VA space while trying to
> reserve one gigabyte is pretty expensive on 32-bit, so after
> we're done mapping, unmap unneeded space.
> 
> Signed-off-by: Anatoly Burakov 
> ---
> 
> Notes:
> v2:
> - Split fix for size_t overflow into separate patch
> - Improve readability of unmapping code
> - Added comment explaining why unmapping is done
> 
Acked-by: Bruce Richardson

Re: [dpdk-dev] [PATCH] vhost/crypto: fix bracket

2018-04-30 Thread Thomas Monjalon

30/04/2018 12:36, Fan Zhang:
> Coverity issue: 233232
> Coverity issue: 233237
> Fixes: 3bb595ecd682 ("vhost/crypto: add request handler")
> 
> Signed-off-by: Fan Zhang 

2 comments, Fan:

1/ I think it the v2 of a previous commit.
Please update the patchwork status (superseded),
use -v option for revision numbering,
and add a changelog.

2/ The title must give the scope of the change,
or give an idea of the impact of the patch.
Example:
fix symmetric ciphering
The root cause (bracket location) is better in the
commit message than the title.

Thanks

[dpdk-dev] New RC needed for stablility?

2018-04-30 Thread Bruce Richardson

Hi Thomas, Ferruh, all,

Initial testing on RC1 from our System Test and Validation shows a lot of
defects/issues, and from the list it appears others may be seeing issues
too. These issues, as well as causing test failures are blocking other
tests from being run. We are also seeing some serious performance
regressions with testpmd.

Based on the number of issues we are seeing, which is probably a result of
the huge number of changes in this release even at the RC1 point, I would
propose that we look to do a new RC some time this week to get as many
critical bugs as possible ironed out, before we add in the remaining 18.05
features. That is: *bug-fix only* RC2, say Wed/Thurs (or sooner if fixes
are ready), with remaining features in RC3 as planned on 11th.

Thoughts, comments?

/Bruce

Re: [dpdk-dev] New RC needed for stablility?

2018-04-30 Thread Thomas Monjalon

30/04/2018 14:57, Bruce Richardson:
> Hi Thomas, Ferruh, all,
> 
> Initial testing on RC1 from our System Test and Validation shows a lot of
> defects/issues, and from the list it appears others may be seeing issues
> too. These issues, as well as causing test failures are blocking other
> tests from being run. We are also seeing some serious performance
> regressions with testpmd.
> 
> Based on the number of issues we are seeing, which is probably a result of
> the huge number of changes in this release even at the RC1 point, I would
> propose that we look to do a new RC some time this week to get as many
> critical bugs as possible ironed out, before we add in the remaining 18.05
> features. That is: *bug-fix only* RC2, say Wed/Thurs (or sooner if fixes
> are ready), with remaining features in RC3 as planned on 11th.
> 
> Thoughts, comments?

OK +1

We need to agree on a list of bugs to be fixed.
Are there some Bugzilla entries?
When they will be fixed, I will tag the RC2.

Re: [dpdk-dev] [PATCH] eal: check if hugedir write lock is already being held

2018-04-30 Thread Andrew Rybchenko


On 04/30/2018 01:38 PM, Anatoly Burakov wrote:

At hugepage info initialization, EAL takes out a write lock on
hugetlbfs directories, and drops it after the memory init is
finished. However, in non-legacy mode, if "-m" or "--socket-mem"
switches are passed, this leads to a deadlock because EAL tries
to allocate pages (and thus take out a write lock on hugedir)
while still holding a separate hugedir write lock in EAL.

Fix it by checking if write lock in hugepage info is active, and
not trying to lock the directory if the hugedir fd is valid.

Fixes: 1a7dc2252f28 ("mem: revert to using flock and add per-segment lockfiles")
Cc: anatoly.bura...@intel.com

Signed-off-by: Anatoly Burakov 


Tested-by: Andrew Rybchenko

Re: [dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles

2018-04-30 Thread Andrew Rybchenko


On 04/30/2018 02:31 PM, Burakov, Anatoly wrote:

On 28-Apr-18 10:38 AM, Andrew Rybchenko wrote:

On 04/25/2018 01:36 PM, Anatoly Burakov wrote:

The original implementation used flock() locks, but was later
switched to using fcntl() locks for page locking, because
fcntl() locks allow locking parts of a file, which is useful
for single-file segments mode, where locking the entire file
isn't as useful because we still need to grow and shrink it.

However, according to fcntl()'s Ubuntu manpage [1], semantics of
fcntl() locks have a giant oversight:

   This interface follows the completely stupid semantics of System
   V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all
   locks associated with a file for a given process are removed
   when any file descriptor for that file is closed by that process.
   This semantic means that applications must be aware of any files
   that a subroutine library may access.

Basically, closing *any* fd with an fcntl() lock (which we do because
we don't want to leak fd's) will drop the lock completely.

So, in this commit, we will be reverting back to using flock() locks
everywhere. However, that still leaves the problem of locking parts
of a memseg list file in single file segments mode, and we will be
solving it with creating separate lock files per each page, and
tracking those with flock().

We will also be removing all of this tailq business and replacing it
with a simple array - saving a few bytes is not worth the extra
hassle of dealing with pointers and potential memory allocation
failures. Also, remove the tailq lock since it is not needed - these
fd lists are per-process, and within a given process, it is always
only one thread handling access to hugetlbfs.

So, first one to allocate a segment will create a lockfile, and put
a shared lock on it. When we're shrinking the page file, we will be
trying to take out a write lock on that lockfile, which would fail if
any other process is holding onto the lockfile as well. This way, we
can know if we can shrink the segment file. Also, if no other locks
are found in the lock list for a given memseg list, the memseg list
fd is automatically closed.

One other thing to note is, according to flock() Ubuntu manpage [2],
upgrading the lock from shared to exclusive is implemented by dropping
and reacquiring the lock, which is not atomic and thus would have
created race conditions. So, on attempting to perform operations in
hugetlbfs, we will take out a writelock on hugetlbfs directory, so
that only one process could perform hugetlbfs operations concurrently.

[1] 
http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html

[2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html

Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime")
Fixes: 2a04139f66b4 ("eal: add single file segments option")
Cc: anatoly.bura...@intel.com

Signed-off-by: Anatoly Burakov 
Acked-by: Bruce Richardson 


We have a problem with the changeset if EAL option -m or --socket-mem 
is used.

EAL initialization hangs just after EAL: Probing VFIO support...
strace points to flock(7, LOCK_EX
List of file descriptors:
# ls /proc/25452/fd -l
total 0
lrwx-- 1 root root 64 Apr 28 10:34 0 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:34 1 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:32 2 -> /dev/pts/0
lrwx-- 1 root root 64 Apr 28 10:34 3 -> /run/.rte_config
lrwx-- 1 root root 64 Apr 28 10:34 4 -> socket:[154166]
lrwx-- 1 root root 64 Apr 28 10:34 5 -> socket:[154158]
lr-x-- 1 root root 64 Apr 28 10:34 6 -> /dev/hugepages
lr-x-- 1 root root 64 Apr 28 10:34 7 -> /dev/hugepages

I guess the problem is that there are two /dev/hugepages and
it hangs on the second.

Ideas how to solve it?

Andrew.



Hi Andrew,

Please try the following patch:

http://dpdk.org/dev/patchwork/patch/39166/

This should fix the issue.


Hi Anatoly,

yes, it fixes the issue.

Thanks,
Andrew.

Re: [dpdk-dev] [PATCH 1/2] eal: fix build with glibc < 2.16

2018-04-30 Thread Aaron Conole

Thomas Monjalon  writes:

> The fake getauxval function does not use its parameter.
> So the compiler raised this error:
>   lib/librte_eal/common/eal_common_cpuflags.c:25:25: error:
>   unused parameter 'type'
>
> Fixes: 2ed9bf330709 ("eal: abstract away the auxiliary vector")
> Cc: acon...@redhat.com
> Cc: tredae...@redhat.com
>
> Signed-off-by: Thomas Monjalon 
> ---
Oops - sorry about that.

Acked-by: Aaron Conole 

>  lib/librte_eal/common/eal_common_cpuflags.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
> b/lib/librte_eal/common/eal_common_cpuflags.c
> index a09667563..6a9dbaeb1 100644
> --- a/lib/librte_eal/common/eal_common_cpuflags.c
> +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> @@ -22,7 +22,7 @@
>  
>  #ifndef HAS_AUXV
>  static unsigned long
> -getauxval(unsigned long type)
> +getauxval(unsigned long type __rte_unused)
>  {
>   errno = ENOTSUP;
>   return 0;

Re: [dpdk-dev] [PATCH 2/2] eal: fix build on FreeBSD

2018-04-30 Thread Aaron Conole

Thomas Monjalon  writes:

> The auxiliary vector read is implemented only for Linux.
> It could be done with procstat_getauxv() for FreeBSD.
>
> Since the commit below, the auxiliary vector functions
> are compiled for every architectures, including x86
> which is tested with FreeBSD.
>
> This patch is only adding a fake/empty implementation
> of auxiliary vector read, for compilation on FreeBSD.
>
> Fixes: 2ed9bf330709 ("eal: abstract away the auxiliary vector")
> Cc: acon...@redhat.com
> Cc: tredae...@redhat.com
>
> Signed-off-by: Thomas Monjalon 
> ---

Makes sense to me.  Thanks for fixing this up, Thomas.  Sorry for
turning it sideways.  I'll make sure to test on freebsd next time.

Acked-by: Aaron Conole 

>  lib/librte_eal/bsdapp/eal/Makefile |  1 +
>  lib/librte_eal/bsdapp/eal/eal_cpuflags.c   | 21 ++
>  lib/librte_eal/bsdapp/eal/meson.build  |  1 +
>  lib/librte_eal/common/eal_common_cpuflags.c| 79 
> --
>  lib/librte_eal/linuxapp/eal/Makefile   |  1 +
>  .../eal/eal_cpuflags.c}| 47 +
>  lib/librte_eal/linuxapp/eal/meson.build|  1 +
>  7 files changed, 26 insertions(+), 125 deletions(-)
>  create mode 100644 lib/librte_eal/bsdapp/eal/eal_cpuflags.c
>  copy lib/librte_eal/{common/eal_common_cpuflags.c => 
> linuxapp/eal/eal_cpuflags.c} (61%)
>
> diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
> b/lib/librte_eal/bsdapp/eal/Makefile
> index 200285e01..3fd33f1e4 100644
> --- a/lib/librte_eal/bsdapp/eal/Makefile
> +++ b/lib/librte_eal/bsdapp/eal/Makefile
> @@ -25,6 +25,7 @@ LIBABIVER := 7
>  
>  # specific to bsdapp exec-env
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
> +SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_cpuflags.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_memory.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_hugepage_info.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_thread.c
> diff --git a/lib/librte_eal/bsdapp/eal/eal_cpuflags.c 
> b/lib/librte_eal/bsdapp/eal/eal_cpuflags.c
> new file mode 100644
> index 0..69b161ea6
> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/eal_cpuflags.c
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2018 Mellanox Technologies, Ltd
> + */
> +
> +#include 
> +#include 
> +
> +unsigned long
> +rte_cpu_getauxval(unsigned long type __rte_unused)
> +{
> + /* not implemented */
> + return 0;
> +}
> +
> +int
> +rte_cpu_strcmp_auxval(unsigned long type __rte_unused,
> + const char *str __rte_unused)
> +{
> + /* not implemented */
> + return -1;
> +}
> diff --git a/lib/librte_eal/bsdapp/eal/meson.build 
> b/lib/librte_eal/bsdapp/eal/meson.build
> index 4c5611879..47e16a649 100644
> --- a/lib/librte_eal/bsdapp/eal/meson.build
> +++ b/lib/librte_eal/bsdapp/eal/meson.build
> @@ -4,6 +4,7 @@
>  env_objs = []
>  env_headers = []
>  env_sources = files('eal_alarm.c',
> + 'eal_cpuflags.c',
>   'eal_debug.c',
>   'eal_hugepage_info.c',
>   'eal_interrupts.c',
> diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
> b/lib/librte_eal/common/eal_common_cpuflags.c
> index 6a9dbaeb1..3a055f7c7 100644
> --- a/lib/librte_eal/common/eal_common_cpuflags.c
> +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> @@ -2,90 +2,11 @@
>   * Copyright(c) 2010-2014 Intel Corporation
>   */
>  
> -#include 
> -#include 
>  #include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#if defined(__GLIBC__) && defined(__GLIBC_PREREQ)
> -#if __GLIBC_PREREQ(2, 16)
> -#include 
> -#define HAS_AUXV 1
> -#endif
> -#endif
>  
>  #include 
>  #include 
>  
> -#ifndef HAS_AUXV
> -static unsigned long
> -getauxval(unsigned long type __rte_unused)
> -{
> - errno = ENOTSUP;
> - return 0;
> -}
> -#endif
> -
> -#ifdef RTE_ARCH_64
> -typedef Elf64_auxv_t Internal_Elfx_auxv_t;
> -#else
> -typedef Elf32_auxv_t Internal_Elfx_auxv_t;
> -#endif
> -
> -
> -/**
> - * Provides a method for retrieving values from the auxiliary vector and
> - * possibly running a string comparison.
> - *
> - * @return Always returns a result.  When the result is 0, check errno
> - * to see if an error occurred during processing.
> - */
> -static unsigned long
> -_rte_cpu_getauxval(unsigned long type, const char *str)
> -{
> - unsigned long val;
> -
> - errno = 0;
> - val = getauxval(type);
> -
> - if (!val && (errno == ENOTSUP || errno == ENOENT)) {
> - int auxv_fd = open("/proc/self/auxv", O_RDONLY);
> - Internal_Elfx_auxv_t auxv;
> -
> - if (auxv_fd == -1)
> - return 0;
> -
> - errno = ENOENT;
> - while (read(auxv_fd, &auxv, sizeof(auxv)) == sizeof(auxv)) {
> - if (auxv.a_type == type) {
> - errno = 0;
> - val = auxv.a_un.a_val;
> - if (str)
> -

Re: [dpdk-dev] [PATCH] eal: check if hugedir write lock is already being held

2018-04-30 Thread Thomas Monjalon

30/04/2018 15:07, Andrew Rybchenko:
> On 04/30/2018 01:38 PM, Anatoly Burakov wrote:
> > At hugepage info initialization, EAL takes out a write lock on
> > hugetlbfs directories, and drops it after the memory init is
> > finished. However, in non-legacy mode, if "-m" or "--socket-mem"
> > switches are passed, this leads to a deadlock because EAL tries
> > to allocate pages (and thus take out a write lock on hugedir)
> > while still holding a separate hugedir write lock in EAL.
> >
> > Fix it by checking if write lock in hugepage info is active, and
> > not trying to lock the directory if the hugedir fd is valid.
> >
> > Fixes: 1a7dc2252f28 ("mem: revert to using flock and add per-segment 
> > lockfiles")
> > Cc: anatoly.bura...@intel.com
> >
> > Signed-off-by: Anatoly Burakov 

Tested-by: Maxime Coquelin 
Tested-by: Shahaf Shuler 
> Tested-by: Andrew Rybchenko 

Applied, thanks

Re: [dpdk-dev] [PATCH] mem: fix heap size not set on init

2018-04-30 Thread Thomas Monjalon

25/04/2018 15:42, Anatoly Burakov:
> When heap initializes, we need to add already allocated segments
> onto the heap. However, in doing that, we never increased total
> heap size. Fix it by adding segment length to total heap length
> when initializing the heap.
> 
> Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
> Cc: anatoly.bura...@intel.com
> 
> Signed-off-by: Anatoly Burakov 

Applied, thanks

[dpdk-dev] [PATCH] examples/flow_classify: fix failure in port_init function

2018-04-30 Thread Bernard Iremonger

The port_init function calls the rte_eth_dev_is_valid_port function.
This function now returns 1 if the port state is attached.
A return value of 1 now means a valid port.

Fixes: a9dbe1802226 ("fix ethdev port id validation")
Signed-off-by: Bernard Iremonger 
---
 examples/flow_classify/flow_classify.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/flow_classify/flow_classify.c 
b/examples/flow_classify/flow_classify.c
index 3b087ce..6412fe4 100644
--- a/examples/flow_classify/flow_classify.c
+++ b/examples/flow_classify/flow_classify.c
@@ -200,7 +200,7 @@ struct rte_flow_query_count count = {
struct rte_eth_dev_info dev_info;
struct rte_eth_txconf txconf;
 
-   if (rte_eth_dev_is_valid_port(port))
+   if (!rte_eth_dev_is_valid_port(port))
return -1;
 
rte_eth_dev_info_get(port, &dev_info);
-- 
1.9.1

Re: [dpdk-dev] [PATCH] examples/flow_classify: fix failure in port_init function

2018-04-30 Thread Thomas Monjalon

30/04/2018 15:43, Bernard Iremonger:
> The port_init function calls the rte_eth_dev_is_valid_port function.
> This function now returns 1 if the port state is attached.
> A return value of 1 now means a valid port.
> 
> Fixes: a9dbe1802226 ("fix ethdev port id validation")
> Signed-off-by: Bernard Iremonger 

My mistake.

Applied, thanks

Re: [dpdk-dev] [PATCH] eal/service: remove experimental tags

2018-04-30 Thread Alejandro Lucero

I just wanted to say I'm using the functionality for debugging NFP firmware
and getting some useful information from the device.

I did not plan to have this upstream, but after this patch for removing the
experimental tag, I think it would be a good idea.

Thanks!

On Wed, Apr 25, 2018 at 1:58 PM, Thomas Monjalon 
wrote:

> > > This commit removes the experimental tags from the
> > > service cores functions, they now become part of the
> > > main DPDK API/ABI.
> > >
> > > Signed-off-by: Harry van Haaren 
> >
> > Acked-by: Jerin Jacob 
>
> Acked-by: Thomas Monjalon 
>
> Applied, congratulations!
>
>
>

Re: [dpdk-dev] [PATCH v5 0/4] ethdev additions to support tunnel encap/decap

2018-04-30 Thread Thomas Monjalon

24/04/2018 18:26, Thomas Monjalon:
> Hi,
> 
> > Declan Doherty (4):
> >   ethdev: Add tunnel encap/decap actions
> >   ethdev: Add group JUMP action
> >   ethdev: add mark flow item to rte_flow_item_types
> >   ethdev: add shared counter support to rte_flow
> 
> No specific comment.
> 
> It is only an API without any PMD implementation.
> Which PMDs are planned to be supported? When?
> 
> Next time, we could require to have at least one implementation,
> when submitting a new API.

One more comment: there is no testpmd usage of this API.

Please Declan, could you fix testpmd by adding new commands
using this new flow encapsulation feature?
We need it in 18.05 in order to avoid having some orphan code.

Thanks

Re: [dpdk-dev] [PATCH] net/vhost: Initialise vid to -1

2018-04-30 Thread Loftus, Ciara

> 
> On 04/27/2018 04:19 PM, Ciara Loftus wrote:
> > rte_eth_vhost_get_vid_from_port_id returns a value of 0 if called before
> > the first call to the new_device callback. A vid value >=0 suggests the
> > device is active which is not the case in this instance. Initialise vid
> > to a negative value to prevent this.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> >   drivers/net/vhost/rte_eth_vhost.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> > index 99a7727..f47950c 100644
> > --- a/drivers/net/vhost/rte_eth_vhost.c
> > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > @@ -1051,6 +1051,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
> uint16_t rx_queue_id,
> > return -ENOMEM;
> > }
> >
> > +   vq->vid = -1;
> > vq->mb_pool = mb_pool;
> > vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
> > dev->data->rx_queues[rx_queue_id] = vq;
> >
> 
> Reviewed-by: Maxime Coquelin 
> 
> Thanks,
> Maxime

On second thoughts, self-NACK.

We need to provision for the case where we want to call eth_rx_queue_setup 
AFTER new_device. For instance when we want to change the mb_pool. In this case 
we need to maintain the same vid and not reset it to -1.

Without this patch the original problem still exists and need to find an 
alternative workaround.

Thanks,
Ciara

[dpdk-dev] nfp doing its own pci_read_config

2018-04-30 Thread Stephen Hemminger

Why is Netronome driver using its own version of existing rte_pci_read_config?
And hard coding magic numbers for offsets.

This shows up as Coverity error

*** CID 277243:  Error handling issues  (CHECKED_RETURN)
/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c: 684 in nfp6000_set_interface()
678  desc->busdev);
679 
680 fp = open(tmp_str, O_RDONLY);
681 if (!fp)
682 return -1;
683 
>>> CID 277243:  Error handling issues  (CHECKED_RETURN)
>>> Calling "lseek(fp, 340L, 0)" without checking return value. This 
>>> library function may fail and return an error code.  
684 lseek(fp, 0x154, SEEK_SET);
685 
686 if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
687 printf("error reading config file for interface\n");
688 return -1;
689  


static int
nfp6000_set_model(struct nfp_pcie_user *desc, struct nfp_cpp *cpp)
{
char tmp_str[80];
uint32_t tmp;
int fp;

snprintf(tmp_str, sizeof(tmp_str), "%s/%s/config", PCI_DEVICES,
 desc->busdev);

fp = open(tmp_str, O_RDONLY);
if (!fp)
return -1;

lseek(fp, 0x2e, SEEK_SET);

if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
printf("Error reading config file for model\n");
return -1;
}

tmp = tmp << 16;

[dpdk-dev] [PATCH] net/i40e: revert default PF PMD device name

2018-04-30 Thread Declan Doherty

Changes introduced by e0cb96204b71 modified the default name generated
for the i40e PF PMD, this patch reverts the default name to the
original PCI BDBF.

Fixes: e0cb96204b71 ("net/i40e: add support for representor ports")
Signed-off-by: Declan Doherty 
---
 drivers/net/i40e/i40e_ethdev.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index d869add95..284e9cb64 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -630,10 +630,7 @@ eth_i40e_pci_probe(struct rte_pci_driver *pci_drv 
__rte_unused,
return retval;
}
 
-   /* physical port net_bdf_port */
-   snprintf(name, sizeof(name), "net_%s", pci_dev->device.name);
-
-   retval = rte_eth_dev_create(&pci_dev->device, name,
+   retval = rte_eth_dev_create(&pci_dev->device, pci_dev->device.name,
sizeof(struct i40e_adapter),
eth_dev_pci_specific_init, pci_dev,
eth_i40e_dev_init, NULL);
@@ -642,7 +639,8 @@ eth_i40e_pci_probe(struct rte_pci_driver *pci_drv 
__rte_unused,
return retval;
 
/* probe VF representor ports */
-   struct rte_eth_dev *pf_ethdev = rte_eth_dev_allocated(name);
+   struct rte_eth_dev *pf_ethdev = rte_eth_dev_allocated(
+   pci_dev->device.name);
 
if (pf_ethdev == NULL)
return -ENODEV;
-- 
2.14.3

[dpdk-dev] [PATCH 2/3] net/ixgbe: initialise nb_representor_ports value

2018-04-30 Thread Declan Doherty

Initialise rte_ethdev_args nb_representor_ports to zero to handle
the case where no devargs are passed to the IXGBE PF on
device probe, so that there is no invalid attempts to create
representor ports.

Coverity Issue: 277231
Fixes: cf80ba6e2038 ("net/ixgbe: add support for representor ports")

Signed-off-by: Declan Doherty 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0ccf55dc8..283dd7e49 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1725,8 +1725,7 @@ eth_ixgbe_pci_probe(struct rte_pci_driver *pci_drv 
__rte_unused,
struct rte_pci_device *pci_dev)
 {
char name[RTE_ETH_NAME_MAX_LEN];
-
-   struct rte_eth_devargs eth_da;
+   struct rte_eth_devargs eth_da = { .nb_representor_ports = 0 };
int i, retval;
 
if (pci_dev->device.devargs) {
-- 
2.14.3

[dpdk-dev] [PATCH 1/3] net/ixgbe: revert default PF PMD device name

2018-04-30 Thread Declan Doherty

Changes introduced by cf80ba6e2038 modified the default name generated
for the IXGBE PF PMD, this patch reverts the default name to the
original PCI BDBF.

Fixes: cf80ba6e2038 ("net/ixgbe: add support for representor ports")
Signed-off-by: Declan Doherty 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 6088c7e48..0ccf55dc8 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1736,10 +1736,7 @@ eth_ixgbe_pci_probe(struct rte_pci_driver *pci_drv 
__rte_unused,
return retval;
}
 
-   /* physical port net_bdf_port */
-   snprintf(name, sizeof(name), "net_%s_%d", pci_dev->device.name, 0);
-
-   retval = rte_eth_dev_create(&pci_dev->device, name,
+   retval = rte_eth_dev_create(&pci_dev->device, pci_dev->device.name,
sizeof(struct ixgbe_adapter),
eth_dev_pci_specific_init, pci_dev,
eth_ixgbe_dev_init, NULL);
@@ -1748,7 +1745,8 @@ eth_ixgbe_pci_probe(struct rte_pci_driver *pci_drv 
__rte_unused,
return retval;
 
/* probe VF representor ports */
-   struct rte_eth_dev *pf_ethdev = rte_eth_dev_allocated(name);
+   struct rte_eth_dev *pf_ethdev = rte_eth_dev_allocated(
+   pci_dev->device.name);
 
for (i = 0; i < eth_da.nb_representor_ports; i++) {
struct ixgbe_vf_info *vfinfo;
-- 
2.14.3

[dpdk-dev] [PATCH 3/3] net/ixgbe: add null pointer check for pf_ethdev

2018-04-30 Thread Declan Doherty

Add NULL parameter check for rte_eth_dev_allocated() API call to
eth_ixgbe_pci_probe().

Coverity Issue: 277216
Fixes: cf80ba6e2038 ("net/ixgbe: add support for representor ports")

Signed-off-by: Declan Doherty 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 283dd7e49..75f927c06 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1747,6 +1747,9 @@ eth_ixgbe_pci_probe(struct rte_pci_driver *pci_drv 
__rte_unused,
struct rte_eth_dev *pf_ethdev = rte_eth_dev_allocated(
pci_dev->device.name);
 
+   if (pf_ethdev == NULL)
+   return -ENODEV;
+
for (i = 0; i < eth_da.nb_representor_ports; i++) {
struct ixgbe_vf_info *vfinfo;
struct ixgbe_vf_representor representor;
-- 
2.14.3

[dpdk-dev] pthread_barrier_deadlock in -rc1 (was: "Re: [PATCH v3 0/5] fix control thread affinities")

2018-04-30 Thread Maxime Coquelin


Hi Olivier,

On 04/24/2018 04:46 PM, Olivier Matz wrote:

Some parts of dpdk use their own management threads. Most of the time,
the affinity of the thread is not properly set: it should not be scheduled
on the dataplane cores, because interrupting them can cause packet losses.

This patchset introduces a new wrapper for thread creation that does
the job automatically, avoiding code duplication.

v3:
* new patch: use this API in examples when relevant.
* replace pthread_kill by pthread_cancel. Note that pthread_join()
   is still needed.
* rebase: vfio and pdump do not have control pthreads anymore, and eal
   has 2 new pthreads
* remove all calls to snprintf/strlcpy that truncate the thread name:
   all strings lengths are already < 16.

v2:
* set affinity to master core if no core is off, as suggested by
   Anatoly

Olivier Matz (5):
   eal: use sizeof to avoid a double use of a define
   eal: new function to create control threads
   eal: set name when creating a control thread
   eal: set affinity for control threads
   examples: use new API to create control threads

  drivers/net/kni/Makefile |  1 +
  drivers/net/kni/rte_eth_kni.c|  3 +-
  examples/tep_termination/main.c  | 16 +++
  examples/vhost/main.c| 19 +++-
  lib/librte_eal/bsdapp/eal/eal.c  |  4 +-
  lib/librte_eal/bsdapp/eal/eal_thread.c   |  2 +-
  lib/librte_eal/common/eal_common_proc.c  | 15 ++
  lib/librte_eal/common/eal_common_thread.c| 72 
  lib/librte_eal/common/include/rte_lcore.h| 26 ++
  lib/librte_eal/linuxapp/eal/eal.c|  4 +-
  lib/librte_eal/linuxapp/eal/eal_interrupts.c | 17 ++-
  lib/librte_eal/linuxapp/eal/eal_thread.c |  2 +-
  lib/librte_eal/linuxapp/eal/eal_timer.c  | 12 +
  lib/librte_eal/rte_eal_version.map   |  1 +
  lib/librte_vhost/socket.c| 25 ++
  15 files changed, 135 insertions(+), 84 deletions(-)



I face a deadlock issue with your series, that Jianfeng patch does not
resolve ("eal: fix threads block on barrier"). Reverting the series and
Jianfeng patch makes the issue to disappear.

I face the problem in a VM (not seen on the host):
# ./install/bin/testpmd -l 0,1,2 --socket-mem 1024 -n 4 --proc-type auto 
--file-prefix pg -- --portmask=3 --forward-mode=macswap 
--port-topology=chained --disable-rss -i --rxq=1 --txq=1 --rxd=256 
--txd=256 --nb-cores=2 --auto-start

EAL: Detected 3 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Auto-detected process type: PRIMARY
EAL: Multi-process socket /var/run/.pg_unix


Then it is stuck. Attaching with GDB, I get below backtrace information:

(gdb) info threads
  Id   Target Id Frame
  3Thread 0x7f63e1f9f700 (LWP 8808) "rte_mp_handle" 
0x7f63e2591bfd in recvmsg () at ../sysdeps/unix/syscall-template.S:81
  2Thread 0x7f63e179e700 (LWP 8809) "rte_mp_async" 
pthread_barrier_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71
* 1Thread 0x7f63e32cec00 (LWP 8807) "testpmd" pthread_barrier_wait 
() at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71

(gdb) bt full
#0  pthread_barrier_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71

No locals.
#1  0x00520c54 in rte_ctrl_thread_create 
(thread=thread@entry=0x7ffe5c895020, name=name@entry=0x869d86 
"rte_mp_async", attr=attr@entry=0x0, 
start_routine=start_routine@entry=0x521030 , 
arg=arg@entry=0x0)

at /root/src/dpdk/lib/librte_eal/common/eal_common_thread.c:207
params = 0x17b1e40
lcore_id = 
cpuset = {__bits = {1, 0 }}
cpu_found = 
ret = 0
#2  0x005220b6 in rte_mp_channel_init () at 
/root/src/dpdk/lib/librte_eal/common/eal_common_proc.c:674

path = "/var/run\000.pg_unix_*", '\000' ...
dir_fd = 4
mp_handle_tid = 140066969745152
async_reply_handle_tid = 140066961352448
#3  0x0050c227 in rte_eal_init (argc=argc@entry=23, 
argv=argv@entry=0x7ffe5c896378) at 
/root/src/dpdk/lib/librte_eal/linuxapp/eal/eal.c:775

i = 
fctret = 11
ret = 
thread_id = 140066989861888
run_once = {cnt = 1}
logid = 0x17b1e00 "testpmd"
cpuset = "T}\211\\\376\177", '\000' , 
"\020", '\000' ...

thread_name = "X}\211\\\376\177\000\000\226\301\036\342c\177\000"
__func__ = "rte_eal_init"
#4  0x00473214 in main (argc=23, argv=0x7ffe5c896378) at 
/root/src/dpdk/app/test-pmd/testpmd.c:2597

diag = 
port_id = 
ret = 
__func__ = "main"
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f63e179e700 (LWP 8809))]
#0  pthread_barrier_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71

71  cmpl%edx, (%rdi)
(gdb) bt full
#0  pthread_barrier_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71

No locals.
#1  0x

[dpdk-dev] [PATCH] vhost: improve dirty pages logging performance

2018-04-30 Thread Maxime Coquelin

This patch caches all dirty pages logging until the used ring index
is updated. These dirty pages won't be accessed by the guest as
long as the host doesn't give them back to it by updating the
index.

The goal of this optimization is to fix a performance regression
introduced when the vhost library started to use atomic operations
to set bits in the shared dirty log map. While the fix was valid
as previous implementation wasn't safe against concurent accesses,
contention was induced.

With this patch, during migration, we have:
1. Less atomic operations as only a single atomic OR operation
per 32 pages.
2. Less atomic operations as during a burst, the same page will
be marked dirty only once.
3. Less write memory barriers.

Fixes: 897f13a1f726 ("vhost: make page logging atomic")

Cc: sta...@dpdk.org

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Maxime Coquelin 
---

Hi,

This series was tested with migrating a guest while running PVP
benchmark at 1Mpps with both ovs-dpdk and testpmd as vswitch.

With this patch we recover the packet drops regressions seen since
the use of atomic operations to log dirty pages.

Some numbers:

A) PVP Live migration using testpmd (Single queue pair)
---

Without patch:
=Stream Rate: 1Mpps==
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0   1Mpps  134 1896616   11628963.0
 1   1Mpps  125 18790168436300.0
 2   1Mpps  122 1917115   13881342.0
 3   1Mpps  132 1891315   12079492.0

   Max   1Mpps  134 1917116 13881342
   Min   1Mpps  122 1879015  8436300
  Mean   1Mpps  128 1896015 11506524
Median   1Mpps  128 1893915 11854227
 Stdev   0 5.68158.81  0.58   2266371.52

=

With patch:
=Stream Rate: 1Mpps==
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0   1Mpps  119 1352115 478174.0
 1   1Mpps  116 1408414 452018.0
 2   1Mpps  122 1390814 464486.0
 3   1Mpps  129 1378716 478234.0

   Max   1Mpps  129 1408416   478234
   Min   1Mpps  116 1352114   452018
  Mean   1Mpps  121 1382514   468228
Median   1Mpps  120 1384714   471330
 Stdev   0 5.57236.52  0.96 12593.78
=


B) OVS-DPDK migration with 2 queue pairs


Without patch:
===Stream Rate: 1Mpps
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0   1Mpps  146 20270   116   15937394.0
 1   1Mpps  150 2561716   11120370.0
 2   1Mpps  138 40983   115   24779971.0
 3   1Mpps  161 2043517   15363519.0

   Max   1Mpps  161 40983   116 24779971
   Min   1Mpps  138 2027016 11120370
  Mean   1Mpps  148 2682666 16800313
Median   1Mpps  148 2302666 15650456
 Stdev   0 9.579758.9 57.16   5737179.93

=

With patch:
===Stream Rate: 1Mpps
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0   1Mpps  155 1891517 481330.0
 1   1Mpps  156 2109718 370556.0
 2   1Mpps  156 4961015 369291.0
 3   1Mpps  144 3112415 361914.0

   Max   1Mpps  156 4961018   481330
   Min   1Mpps  144 1891515   361914
  Mean   1Mpps  152 3018616   395772
Median   1Mpps  155 2611016   369923
 Stdev   0 5.85  13997.82   1.5 57165.33
=

C) OVS-DPDK migration with single queue pair


Without patch:
===Stream Rate: 1Mpps
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0   1Mpps  129 1741115   11105414.0
 1   1Mpps  130 16544158028438.0
 2   1Mpps  132 15202157491584.0
 3   1Mpps  133 18100158385047.0

   Max   1Mpps  133 1810015 11105414
   Min   1Mpps  129 1520215  7491584
  Mean   1Mp

Re: [dpdk-dev] pthread_barrier_deadlock in -rc1 (was: "Re: [PATCH v3 0/5] fix control thread affinities")

2018-04-30 Thread Olivier Matz

Hi Maxime,

Le 30 avril 2018 17:45:52 GMT+02:00, Maxime Coquelin 
 a écrit :
>Hi Olivier,
>
>On 04/24/2018 04:46 PM, Olivier Matz wrote:
>> Some parts of dpdk use their own management threads. Most of the
>time,
>> the affinity of the thread is not properly set: it should not be
>scheduled
>> on the dataplane cores, because interrupting them can cause packet
>losses.
>> 
>> This patchset introduces a new wrapper for thread creation that does
>> the job automatically, avoiding code duplication.
>> 
>> v3:
>> * new patch: use this API in examples when relevant.
>> * replace pthread_kill by pthread_cancel. Note that pthread_join()
>>is still needed.
>> * rebase: vfio and pdump do not have control pthreads anymore, and
>eal
>>has 2 new pthreads
>> * remove all calls to snprintf/strlcpy that truncate the thread name:
>>all strings lengths are already < 16.
>> 
>> v2:
>> * set affinity to master core if no core is off, as suggested by
>>Anatoly
>> 
>> Olivier Matz (5):
>>eal: use sizeof to avoid a double use of a define
>>eal: new function to create control threads
>>eal: set name when creating a control thread
>>eal: set affinity for control threads
>>examples: use new API to create control threads
>> 
>>   drivers/net/kni/Makefile |  1 +
>>   drivers/net/kni/rte_eth_kni.c|  3 +-
>>   examples/tep_termination/main.c  | 16 +++
>>   examples/vhost/main.c| 19 +++-
>>   lib/librte_eal/bsdapp/eal/eal.c  |  4 +-
>>   lib/librte_eal/bsdapp/eal/eal_thread.c   |  2 +-
>>   lib/librte_eal/common/eal_common_proc.c  | 15 ++
>>   lib/librte_eal/common/eal_common_thread.c| 72
>
>>   lib/librte_eal/common/include/rte_lcore.h| 26 ++
>>   lib/librte_eal/linuxapp/eal/eal.c|  4 +-
>>   lib/librte_eal/linuxapp/eal/eal_interrupts.c | 17 ++-
>>   lib/librte_eal/linuxapp/eal/eal_thread.c |  2 +-
>>   lib/librte_eal/linuxapp/eal/eal_timer.c  | 12 +
>>   lib/librte_eal/rte_eal_version.map   |  1 +
>>   lib/librte_vhost/socket.c| 25 ++
>>   15 files changed, 135 insertions(+), 84 deletions(-)
>> 
>
>I face a deadlock issue with your series, that Jianfeng patch does not
>resolve ("eal: fix threads block on barrier"). Reverting the series and
>Jianfeng patch makes the issue to disappear.
>
>I face the problem in a VM (not seen on the host):
># ./install/bin/testpmd -l 0,1,2 --socket-mem 1024 -n 4 --proc-type
>auto 
>--file-prefix pg -- --portmask=3 --forward-mode=macswap 
>--port-topology=chained --disable-rss -i --rxq=1 --txq=1 --rxd=256 
>--txd=256 --nb-cores=2 --auto-start
>EAL: Detected 3 lcore(s)
>EAL: Detected 1 NUMA nodes
>EAL: Auto-detected process type: PRIMARY
>EAL: Multi-process socket /var/run/.pg_unix
>
>
>Then it is stuck. Attaching with GDB, I get below backtrace
>information:
>
>(gdb) info threads
>   Id   Target Id Frame
>   3Thread 0x7f63e1f9f700 (LWP 8808) "rte_mp_handle" 
>0x7f63e2591bfd in recvmsg () at
>../sysdeps/unix/syscall-template.S:81
>   2Thread 0x7f63e179e700 (LWP 8809) "rte_mp_async" 
>pthread_barrier_wait () at 
>../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71
>* 1Thread 0x7f63e32cec00 (LWP 8807) "testpmd" pthread_barrier_wait 
>() at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71
>(gdb) bt full
>#0  pthread_barrier_wait () at 
>../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71
>No locals.
>#1  0x00520c54 in rte_ctrl_thread_create 
>(thread=thread@entry=0x7ffe5c895020, name=name@entry=0x869d86 
>"rte_mp_async", attr=attr@entry=0x0, 
>start_routine=start_routine@entry=0x521030 , 
>arg=arg@entry=0x0)
> at /root/src/dpdk/lib/librte_eal/common/eal_common_thread.c:207
> params = 0x17b1e40
> lcore_id = 
> cpuset = {__bits = {1, 0 }}
> cpu_found = 
> ret = 0
>#2  0x005220b6 in rte_mp_channel_init () at 
>/root/src/dpdk/lib/librte_eal/common/eal_common_proc.c:674
>path = "/var/run\000.pg_unix_*", '\000' ...
> dir_fd = 4
> mp_handle_tid = 140066969745152
> async_reply_handle_tid = 140066961352448
>#3  0x0050c227 in rte_eal_init (argc=argc@entry=23, 
>argv=argv@entry=0x7ffe5c896378) at 
>/root/src/dpdk/lib/librte_eal/linuxapp/eal/eal.c:775
> i = 
> fctret = 11
> ret = 
> thread_id = 140066989861888
> run_once = {cnt = 1}
> logid = 0x17b1e00 "testpmd"
> cpuset = "T}\211\\\376\177", '\000' , 
>"\020", '\000' ...
>  thread_name = "X}\211\\\376\177\000\000\226\301\036\342c\177\000"
> __func__ = "rte_eal_init"
>#4  0x00473214 in main (argc=23, argv=0x7ffe5c896378) at 
>/root/src/dpdk/app/test-pmd/testpmd.c:2597
> diag = 
> port_id = 
> ret = 
> __func__ = "main"
>(gdb) thread 2
>[Switching to thread 2

[dpdk-dev] [PATCH] net/vmxnet3: convert to new rx offload api

2018-04-30 Thread Louis Luo

Ethdev RX offloads API has changed since: commit ce17eddefc20
("ethdev: introduce Rx queue offloads API")

This patch adopts the new RX Offload API in vmxnet3 driver.

Signed-off-by: Louis Luo 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 61 ++--
 1 file changed, 45 insertions(+), 16 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 4568521..d9d5bda 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -42,6 +42,23 @@
 
 #defineVMXNET3_TX_MAX_SEG  UINT8_MAX
 
+#define VMXNET3_TX_OFFLOAD_CAP \
+   (DEV_TX_OFFLOAD_VLAN_INSERT |   \
+DEV_TX_OFFLOAD_IPV4_CKSUM |\
+DEV_TX_OFFLOAD_TCP_CKSUM | \
+DEV_TX_OFFLOAD_UDP_CKSUM | \
+DEV_TX_OFFLOAD_TCP_TSO |   \
+DEV_TX_OFFLOAD_MULTI_SEGS)
+
+#define VMXNET3_RX_OFFLOAD_CAP \
+   (DEV_RX_OFFLOAD_VLAN_STRIP |\
+DEV_RX_OFFLOAD_SCATTER |   \
+DEV_RX_OFFLOAD_IPV4_CKSUM |\
+DEV_RX_OFFLOAD_UDP_CKSUM | \
+DEV_RX_OFFLOAD_TCP_CKSUM | \
+DEV_RX_OFFLOAD_TCP_LRO |   \
+DEV_RX_OFFLOAD_JUMBO_FRAME)
+
 static int eth_vmxnet3_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_vmxnet3_dev_uninit(struct rte_eth_dev *eth_dev);
 static int vmxnet3_dev_configure(struct rte_eth_dev *dev);
@@ -376,9 +393,25 @@ vmxnet3_dev_configure(struct rte_eth_dev *dev)
const struct rte_memzone *mz;
struct vmxnet3_hw *hw = dev->data->dev_private;
size_t size;
+   uint64_t rx_offloads = dev->data->dev_conf.rxmode.offloads;
+   uint64_t tx_offloads = dev->data->dev_conf.txmode.offloads;
 
PMD_INIT_FUNC_TRACE();
 
+   if ((rx_offloads & VMXNET3_RX_OFFLOAD_CAP) != rx_offloads) {
+   RTE_LOG(ERR, PMD, "Requested RX offloads 0x%lx"
+   " do not match supported 0x%lx\n",
+   rx_offloads, (uint64_t)VMXNET3_RX_OFFLOAD_CAP);
+   return -ENOTSUP;
+   }
+
+   if ((tx_offloads & VMXNET3_TX_OFFLOAD_CAP) != tx_offloads) {
+   RTE_LOG(ERR, PMD, "Requested TX offloads 0x%lx"
+   " do not match supported 0x%lx\n",
+   tx_offloads, (uint64_t)VMXNET3_TX_OFFLOAD_CAP);
+   return -ENOTSUP;
+   }
+
if (dev->data->nb_tx_queues > VMXNET3_MAX_TX_QUEUES ||
dev->data->nb_rx_queues > VMXNET3_MAX_RX_QUEUES) {
PMD_INIT_LOG(ERR, "ERROR: Number of queues not supported");
@@ -567,6 +600,7 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
uint32_t mtu = dev->data->mtu;
Vmxnet3_DriverShared *shared = hw->shared;
Vmxnet3_DSDevRead *devRead = &shared->devRead;
+   uint64_t rx_offloads = dev->data->dev_conf.rxmode.offloads;
uint32_t i;
int ret;
 
@@ -644,10 +678,10 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
devRead->rxFilterConf.rxMode = 0;
 
/* Setting up feature flags */
-   if (dev->data->dev_conf.rxmode.hw_ip_checksum)
+   if (rx_offloads & DEV_RX_OFFLOAD_CHECKSUM)
devRead->misc.uptFeatures |= VMXNET3_F_RXCSUM;
 
-   if (dev->data->dev_conf.rxmode.enable_lro) {
+   if (rx_offloads & DEV_RX_OFFLOAD_TCP_LRO) {
devRead->misc.uptFeatures |= VMXNET3_F_LRO;
devRead->misc.maxNumRxSG = 0;
}
@@ -1050,17 +1084,10 @@ vmxnet3_dev_info_get(struct rte_eth_dev *dev 
__rte_unused,
.nb_mtu_seg_max = VMXNET3_MAX_TXD_PER_PKT,
};
 
-   dev_info->rx_offload_capa =
-   DEV_RX_OFFLOAD_VLAN_STRIP |
-   DEV_RX_OFFLOAD_UDP_CKSUM |
-   DEV_RX_OFFLOAD_TCP_CKSUM |
-   DEV_RX_OFFLOAD_TCP_LRO;
-
-   dev_info->tx_offload_capa =
-   DEV_TX_OFFLOAD_VLAN_INSERT |
-   DEV_TX_OFFLOAD_TCP_CKSUM |
-   DEV_TX_OFFLOAD_UDP_CKSUM |
-   DEV_TX_OFFLOAD_TCP_TSO;
+   dev_info->rx_offload_capa = VMXNET3_RX_OFFLOAD_CAP;
+   dev_info->rx_queue_offload_capa = 0;
+   dev_info->tx_offload_capa = VMXNET3_TX_OFFLOAD_CAP;
+   dev_info->tx_queue_offload_capa = 0;
 }
 
 static const uint32_t *
@@ -1154,8 +1181,9 @@ vmxnet3_dev_promiscuous_disable(struct rte_eth_dev *dev)
 {
struct vmxnet3_hw *hw = dev->data->dev_private;
uint32_t *vf_table = hw->shared->devRead.rxFilterConf.vfTable;
+   uint64_t rx_offloads = dev->data->dev_conf.rxmode.offloads;
 
-   if (dev->data->dev_conf.rxmode.hw_vlan_filter)
+   if (rx_offloads & DEV_RX_OFFLOAD_VLAN_FILTER)
memcpy(vf_table, hw->shadow_vfta, VMXNET3_VFT_TABLE_SIZE);
else
memset(vf_table, 0xff, VMXNET3_VFT_TABLE_SIZE);
@@ -1217,9 +1245,10 @@ vmxnet3_dev_vlan_offload_set(struct rte_eth_dev *dev, 
int mask)
struct vmxnet3_hw *hw = dev->data->dev_private;
Vmxnet3_DSDevRead *devRead

Re: [dpdk-dev] [PATCH] net/vmxnet3: convert to new rx offload api

2018-04-30 Thread Yong Wang

> -Original Message-
> From: Louis Luo [mailto:llo...@vmware.com]
> Sent: Monday, April 30, 2018 3:21 PM
> To: Yong Wang 
> Cc: dev@dpdk.org; Louis Luo 
> Subject: [PATCH] net/vmxnet3: convert to new rx offload api
> 
> Ethdev RX offloads API has changed since: commit ce17eddefc20
> ("ethdev: introduce Rx queue offloads API")
> 
> This patch adopts the new RX Offload API in vmxnet3 driver.
> 
> Signed-off-by: Louis Luo 

Acked-by: Yong Wang 

> ---
>  drivers/net/vmxnet3/vmxnet3_ethdev.c | 61
> ++--
>  1 file changed, 45 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c
> b/drivers/net/vmxnet3/vmxnet3_ethdev.c
> index 4568521..d9d5bda 100644
> --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
> +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
> @@ -42,6 +42,23 @@
> 
>  #define  VMXNET3_TX_MAX_SEG  UINT8_MAX
> 
> +#define VMXNET3_TX_OFFLOAD_CAP   \
> + (DEV_TX_OFFLOAD_VLAN_INSERT |   \
> +  DEV_TX_OFFLOAD_IPV4_CKSUM |\
> +  DEV_TX_OFFLOAD_TCP_CKSUM | \
> +  DEV_TX_OFFLOAD_UDP_CKSUM | \
> +  DEV_TX_OFFLOAD_TCP_TSO |   \
> +  DEV_TX_OFFLOAD_MULTI_SEGS)
> +
> +#define VMXNET3_RX_OFFLOAD_CAP   \
> + (DEV_RX_OFFLOAD_VLAN_STRIP |\
> +  DEV_RX_OFFLOAD_SCATTER |   \
> +  DEV_RX_OFFLOAD_IPV4_CKSUM |\
> +  DEV_RX_OFFLOAD_UDP_CKSUM | \
> +  DEV_RX_OFFLOAD_TCP_CKSUM | \
> +  DEV_RX_OFFLOAD_TCP_LRO |   \
> +  DEV_RX_OFFLOAD_JUMBO_FRAME)
> +
>  static int eth_vmxnet3_dev_init(struct rte_eth_dev *eth_dev);
>  static int eth_vmxnet3_dev_uninit(struct rte_eth_dev *eth_dev);
>  static int vmxnet3_dev_configure(struct rte_eth_dev *dev);
> @@ -376,9 +393,25 @@ vmxnet3_dev_configure(struct rte_eth_dev *dev)
>   const struct rte_memzone *mz;
>   struct vmxnet3_hw *hw = dev->data->dev_private;
>   size_t size;
> + uint64_t rx_offloads = dev->data->dev_conf.rxmode.offloads;
> + uint64_t tx_offloads = dev->data->dev_conf.txmode.offloads;
> 
>   PMD_INIT_FUNC_TRACE();
> 
> + if ((rx_offloads & VMXNET3_RX_OFFLOAD_CAP) != rx_offloads) {
> + RTE_LOG(ERR, PMD, "Requested RX offloads 0x%lx"
> + " do not match supported 0x%lx\n",
> + rx_offloads,
> (uint64_t)VMXNET3_RX_OFFLOAD_CAP);
> + return -ENOTSUP;
> + }
> +
> + if ((tx_offloads & VMXNET3_TX_OFFLOAD_CAP) != tx_offloads) {
> + RTE_LOG(ERR, PMD, "Requested TX offloads 0x%lx"
> + " do not match supported 0x%lx\n",
> + tx_offloads, (uint64_t)VMXNET3_TX_OFFLOAD_CAP);
> + return -ENOTSUP;
> + }
> +
>   if (dev->data->nb_tx_queues > VMXNET3_MAX_TX_QUEUES ||
>   dev->data->nb_rx_queues > VMXNET3_MAX_RX_QUEUES) {
>   PMD_INIT_LOG(ERR, "ERROR: Number of queues not
> supported");
> @@ -567,6 +600,7 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev
> *dev)
>   uint32_t mtu = dev->data->mtu;
>   Vmxnet3_DriverShared *shared = hw->shared;
>   Vmxnet3_DSDevRead *devRead = &shared->devRead;
> + uint64_t rx_offloads = dev->data->dev_conf.rxmode.offloads;
>   uint32_t i;
>   int ret;
> 
> @@ -644,10 +678,10 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev
> *dev)
>   devRead->rxFilterConf.rxMode = 0;
> 
>   /* Setting up feature flags */
> - if (dev->data->dev_conf.rxmode.hw_ip_checksum)
> + if (rx_offloads & DEV_RX_OFFLOAD_CHECKSUM)
>   devRead->misc.uptFeatures |= VMXNET3_F_RXCSUM;
> 
> - if (dev->data->dev_conf.rxmode.enable_lro) {
> + if (rx_offloads & DEV_RX_OFFLOAD_TCP_LRO) {
>   devRead->misc.uptFeatures |= VMXNET3_F_LRO;
>   devRead->misc.maxNumRxSG = 0;
>   }
> @@ -1050,17 +1084,10 @@ vmxnet3_dev_info_get(struct rte_eth_dev *dev
> __rte_unused,
>   .nb_mtu_seg_max = VMXNET3_MAX_TXD_PER_PKT,
>   };
> 
> - dev_info->rx_offload_capa =
> - DEV_RX_OFFLOAD_VLAN_STRIP |
> - DEV_RX_OFFLOAD_UDP_CKSUM |
> - DEV_RX_OFFLOAD_TCP_CKSUM |
> - DEV_RX_OFFLOAD_TCP_LRO;
> -
> - dev_info->tx_offload_capa =
> - DEV_TX_OFFLOAD_VLAN_INSERT |
> - DEV_TX_OFFLOAD_TCP_CKSUM |
> - DEV_TX_OFFLOAD_UDP_CKSUM |
> - DEV_TX_OFFLOAD_TCP_TSO;
> + dev_info->rx_offload_capa = VMXNET3_RX_OFFLOAD_CAP;
> + dev_info->rx_queue_offload_capa = 0;
> + dev_info->tx_offload_capa = VMXNET3_TX_OFFLOAD_CAP;
> + dev_info->tx_queue_offload_capa = 0;
>  }
> 
>  static const uint32_t *
> @@ -1154,8 +1181,9 @@ vmxnet3_dev_promiscuous_disable(struct
> rte_eth_dev *dev)
>  {
>   struct vmxnet3_hw *hw = dev->data->dev_private;
>   uint32_t *vf_table = hw->shared->devRead.rxFilterConf.vfTable;
> + uint64_t rx_offloads = dev->data->dev_conf.rxmode.offloads;
> 
> - if (dev->data->dev_conf.rxmode.hw_vlan_filter)
> + if (rx_offloads & DEV_RX_O

[dpdk-dev] [PATCH 03/12] net/bnxt: rename driver version from Cumulus to NetXtreme

2018-04-30 Thread Ajit Khaparde

From: Scott Branden 

Rename driver version from "Broadcom Cumulus driver" to
"Broadcom NetXtreme driver" to reflect this driver is applicable to
NetXtreme family beyond Cumulus.

Signed-off-by: Scott Branden 
Reviewed-by: Ajit Kumar Khaparde 
---
 drivers/net/bnxt/bnxt_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 3e2ccfa90..58241ccac 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -29,7 +29,7 @@
 
 #define DRV_MODULE_NAME"bnxt"
 static const char bnxt_version[] =
-   "Broadcom Cumulus driver " DRV_MODULE_NAME "\n";
+   "Broadcom NetXtreme driver " DRV_MODULE_NAME "\n";
 int bnxt_logtype_driver;
 
 #define PCI_VENDOR_ID_BROADCOM 0x14E4
-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 04/12] net/bnxt: return EINVAL instead of ENOSPC on invalid max ring

2018-04-30 Thread Ajit Khaparde

From: Jay Ding 

Return EINVAL instead of ENOSPC when invalid queue_idx passed in
during rx and tx queue_setup_op routines.

Signed-off-by: Jay Ding 
Signed-off-by: Scott Branden 
Reviewed-by: Ray Jui 
Reviewed-by: Ajit Kumar Khaparde 
---
 drivers/net/bnxt/bnxt_rxq.c | 2 +-
 drivers/net/bnxt/bnxt_txq.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_rxq.c b/drivers/net/bnxt/bnxt_rxq.c
index e939c9ac0..4e6fa4e30 100644
--- a/drivers/net/bnxt/bnxt_rxq.c
+++ b/drivers/net/bnxt/bnxt_rxq.c
@@ -290,7 +290,7 @@ int bnxt_rx_queue_setup_op(struct rte_eth_dev *eth_dev,
PMD_DRV_LOG(ERR,
"Cannot create Rx ring %d. Only %d rings available\n",
queue_idx, bp->max_rx_rings);
-   return -ENOSPC;
+   return -EINVAL;
}
 
if (!nb_desc || nb_desc > MAX_RX_DESC_CNT) {
diff --git a/drivers/net/bnxt/bnxt_txq.c b/drivers/net/bnxt/bnxt_txq.c
index 07e25d77b..b50f37cf2 100644
--- a/drivers/net/bnxt/bnxt_txq.c
+++ b/drivers/net/bnxt/bnxt_txq.c
@@ -86,7 +86,7 @@ int bnxt_tx_queue_setup_op(struct rte_eth_dev *eth_dev,
PMD_DRV_LOG(ERR,
"Cannot create Tx ring %d. Only %d rings available\n",
queue_idx, bp->max_tx_rings);
-   return -ENOSPC;
+   return -EINVAL;
}
 
if (!nb_desc || nb_desc > MAX_TX_DESC_CNT) {
-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 01/12] net/bnxt: add support for lsc interrupt event

2018-04-30 Thread Ajit Khaparde

From: Qingmin Liu 

Add support to bnxt driver to register RTE_ETH_EVENT_INTR_LSC
event and monitor physical link status.

Signed-off-by: Qingmin Liu 
Signed-off-by: Scott Branden 
Signed-off-by: Ajit Kumar Khaparde 
Reviewed-by: Randy Schacher 
---
 drivers/net/bnxt/bnxt_ethdev.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 348129dad..229017ace 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -780,6 +780,11 @@ int bnxt_link_update_op(struct rte_eth_dev *eth_dev, int 
wait_to_complete)
new.link_speed != eth_dev->data->dev_link.link_speed) {
memcpy(ð_dev->data->dev_link, &new,
sizeof(struct rte_eth_link));
+
+   _rte_eth_dev_callback_process(eth_dev,
+ RTE_ETH_EVENT_INTR_LSC,
+ NULL);
+
bnxt_print_link_info(eth_dev);
}
 
-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 02/12] net/bnxt: rename function checking MAC address

2018-04-30 Thread Ajit Khaparde

From: Scott Branden 

rename check_zero_bytes to bnxt_check_zero_bytes to match proper prefix.

Signed-off-by: Scott Branden 
Signed-off-by: Ajit Kumar Khaparde 
---
 drivers/net/bnxt/bnxt_ethdev.c | 2 +-
 drivers/net/bnxt/bnxt_filter.c | 8 +---
 drivers/net/bnxt/bnxt_filter.h | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 229017ace..3e2ccfa90 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3286,7 +3286,7 @@ bnxt_dev_init(struct rte_eth_dev *eth_dev)
goto error_free;
}
 
-   if (check_zero_bytes(bp->dflt_mac_addr, ETHER_ADDR_LEN)) {
+   if (bnxt_check_zero_bytes(bp->dflt_mac_addr, ETHER_ADDR_LEN)) {
PMD_DRV_LOG(ERR,
"Invalid MAC addr %02X:%02X:%02X:%02X:%02X:%02X\n",
bp->dflt_mac_addr[0], bp->dflt_mac_addr[1],
diff --git a/drivers/net/bnxt/bnxt_filter.c b/drivers/net/bnxt/bnxt_filter.c
index dadd1e32f..e36da9977 100644
--- a/drivers/net/bnxt/bnxt_filter.c
+++ b/drivers/net/bnxt/bnxt_filter.c
@@ -231,7 +231,7 @@ nxt_non_void_action(const struct rte_flow_action *cur)
}
 }
 
-int check_zero_bytes(const uint8_t *bytes, int len)
+int bnxt_check_zero_bytes(const uint8_t *bytes, int len)
 {
int i;
for (i = 0; i < len; i++)
@@ -512,13 +512,15 @@ bnxt_validate_and_parse_flow_type(struct bnxt *bp,
   ipv6_spec->hdr.src_addr, 16);
rte_memcpy(filter->dst_ipaddr,
   ipv6_spec->hdr.dst_addr, 16);
-   if (!check_zero_bytes(ipv6_mask->hdr.src_addr, 16)) {
+   if (!bnxt_check_zero_bytes(ipv6_mask->hdr.src_addr,
+  16)) {
rte_memcpy(filter->src_ipaddr_mask,
   ipv6_mask->hdr.src_addr, 16);
en |= !use_ntuple ? 0 :
NTUPLE_FLTR_ALLOC_INPUT_EN_SRC_IPADDR_MASK;
}
-   if (!check_zero_bytes(ipv6_mask->hdr.dst_addr, 16)) {
+   if (!bnxt_check_zero_bytes(ipv6_mask->hdr.dst_addr,
+  16)) {
rte_memcpy(filter->dst_ipaddr_mask,
   ipv6_mask->hdr.dst_addr, 16);
en |= !use_ntuple ? 0 :
diff --git a/drivers/net/bnxt/bnxt_filter.h b/drivers/net/bnxt/bnxt_filter.h
index c70b127ac..d27be7032 100644
--- a/drivers/net/bnxt/bnxt_filter.h
+++ b/drivers/net/bnxt/bnxt_filter.h
@@ -69,7 +69,7 @@ struct bnxt_filter_info *bnxt_get_unused_filter(struct bnxt 
*bp);
 void bnxt_free_filter(struct bnxt *bp, struct bnxt_filter_info *filter);
 struct bnxt_filter_info *bnxt_get_l2_filter(struct bnxt *bp,
struct bnxt_filter_info *nf, struct bnxt_vnic_info *vnic);
-int check_zero_bytes(const uint8_t *bytes, int len);
+int bnxt_check_zero_bytes(const uint8_t *bytes, int len);
 
 #define NTUPLE_FLTR_ALLOC_INPUT_EN_SRC_MACADDR \
HWRM_CFA_NTUPLE_FILTER_ALLOC_INPUT_ENABLES_SRC_MACADDR
-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 06/12] net/bnxt: set MTU in dev config for jumbo packets

2018-04-30 Thread Ajit Khaparde

From: Qingmin Liu 

MTU setting does not take effect after rte_eth_dev_configure
is called with jumbo enable unless it is configured using the
set_mtu dev_op.

Fixes: daef48efe5e5 ("net/bnxt: support set MTU")
Cc: sta...@dpdk.org

Signed-off-by: Qingmin Liu 
Signed-off-by: Scott Branden 
Reviewed-by: Jay Ding 
Reviewed-by: Randy Schacher 
Reviewed-by: Ajit Kumar Khaparde 
Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_ethdev.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 58241ccac..e68608f61 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -151,6 +151,7 @@ static const struct rte_pci_id bnxt_pci_id_map[] = {
 
 static int bnxt_vlan_offload_set_op(struct rte_eth_dev *dev, int mask);
 static void bnxt_print_link_info(struct rte_eth_dev *eth_dev);
+static int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, uint16_t new_mtu);
 
 /***/
 
@@ -548,10 +549,12 @@ static int bnxt_dev_configure_op(struct rte_eth_dev 
*eth_dev)
bp->rx_cp_nr_rings = bp->rx_nr_rings;
bp->tx_cp_nr_rings = bp->tx_nr_rings;
 
-   if (rx_offloads & DEV_RX_OFFLOAD_JUMBO_FRAME)
+   if (rx_offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
eth_dev->data->mtu =
eth_dev->data->dev_conf.rxmode.max_rx_pkt_len -
ETHER_HDR_LEN - ETHER_CRC_LEN - VLAN_TAG_SIZE;
+   bnxt_mtu_set_op(eth_dev, eth_dev->data->mtu);
+   }
return 0;
 }
 
-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 05/12] net/bnxt: Validate structs and pointers before use

2018-04-30 Thread Ajit Khaparde

From: Rahul Gupta 

Validate pointers aren't pointing to uninitialized areas
including txq and rxq before using them to avoid
bnxt driver from crashing.

Signed-off-by: Rahul Gupta 
Signed-off-by: Jay Ding 
Signed-off-by: Scott Branden 
Reviewed-by: Ray Jui 
Reviewed-by: Ajit Kumar Khaparde 
Reviewed-by: Randy Schacher 
Tested-by: Randy Schacher 
---
 drivers/net/bnxt/bnxt_ring.c | 3 +++
 drivers/net/bnxt/bnxt_rxq.c  | 6 ++
 drivers/net/bnxt/bnxt_txq.c  | 9 +
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ring.c b/drivers/net/bnxt/bnxt_ring.c
index 8e822e11f..aa9f3f4cc 100644
--- a/drivers/net/bnxt/bnxt_ring.c
+++ b/drivers/net/bnxt/bnxt_ring.c
@@ -24,6 +24,9 @@
 
 void bnxt_free_ring(struct bnxt_ring *ring)
 {
+   if (!ring)
+   return;
+
if (ring->vmem_size && *ring->vmem) {
memset((char *)*ring->vmem, 0, ring->vmem_size);
*ring->vmem = NULL;
diff --git a/drivers/net/bnxt/bnxt_rxq.c b/drivers/net/bnxt/bnxt_rxq.c
index 4e6fa4e30..4b380d4f0 100644
--- a/drivers/net/bnxt/bnxt_rxq.c
+++ b/drivers/net/bnxt/bnxt_rxq.c
@@ -23,10 +23,8 @@
 
 void bnxt_free_rxq_stats(struct bnxt_rx_queue *rxq)
 {
-   struct bnxt_cp_ring_info *cpr = rxq->cp_ring;
-
-   if (cpr->hw_stats)
-   cpr->hw_stats = NULL;
+   if (rxq && rxq->cp_ring && rxq->cp_ring->hw_stats)
+   rxq->cp_ring->hw_stats = NULL;
 }
 
 int bnxt_mq_rx_configure(struct bnxt *bp)
diff --git a/drivers/net/bnxt/bnxt_txq.c b/drivers/net/bnxt/bnxt_txq.c
index b50f37cf2..b9b975e4c 100644
--- a/drivers/net/bnxt/bnxt_txq.c
+++ b/drivers/net/bnxt/bnxt_txq.c
@@ -19,10 +19,8 @@
 
 void bnxt_free_txq_stats(struct bnxt_tx_queue *txq)
 {
-   struct bnxt_cp_ring_info *cpr = txq->cp_ring;
-
-   if (cpr->hw_stats)
-   cpr->hw_stats = NULL;
+   if (txq && txq->cp_ring && txq->cp_ring->hw_stats)
+   txq->cp_ring->hw_stats = NULL;
 }
 
 static void bnxt_tx_queue_release_mbufs(struct bnxt_tx_queue *txq)
@@ -30,6 +28,9 @@ static void bnxt_tx_queue_release_mbufs(struct bnxt_tx_queue 
*txq)
struct bnxt_sw_tx_bd *sw_ring;
uint16_t i;
 
+   if (!txq)
+   return;
+
sw_ring = txq->tx_ring->tx_buf_ring;
if (sw_ring) {
for (i = 0; i < txq->tx_ring->tx_ring_struct->ring_size; i++) {
-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 07/12] net/bnxt: fix MTU calculation

2018-04-30 Thread Ajit Khaparde

We were not considering the case of nested VLANs while
calculating MTU. This patch takes care of the same.

Fixes: daef48efe5e5 ("net/bnxt: support set MTU")
Cc: sta...@dpdk.org

Signed-off-by: Qingmin Liu 
Signed-off-by: Scott Branden 
Reviewed-by: Jay Ding 
Reviewed-by: Ajit Kumar Khaparde 
Reviewed-by: Randy Schacher 
Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h| 1 +
 drivers/net/bnxt/bnxt_ethdev.c | 3 ++-
 drivers/net/bnxt/bnxt_hwrm.c   | 9 ++---
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index 97b0e0853..110cdb992 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -23,6 +23,7 @@
 #define BNXT_MAX_MTU   9500
 #define VLAN_TAG_SIZE  4
 #define BNXT_MAX_LED   4
+#define BNXT_NUM_VLANS 2
 
 struct bnxt_led_info {
uint8_t  led_id;
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index e68608f61..20ed0a31f 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -552,7 +552,8 @@ static int bnxt_dev_configure_op(struct rte_eth_dev 
*eth_dev)
if (rx_offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
eth_dev->data->mtu =
eth_dev->data->dev_conf.rxmode.max_rx_pkt_len -
-   ETHER_HDR_LEN - ETHER_CRC_LEN - VLAN_TAG_SIZE;
+   ETHER_HDR_LEN - ETHER_CRC_LEN - VLAN_TAG_SIZE *
+   BNXT_NUM_VLANS;
bnxt_mtu_set_op(eth_dev, eth_dev->data->mtu);
}
return 0;
diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index bc8773509..c136edc06 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -2360,7 +2360,8 @@ static int bnxt_hwrm_pf_func_cfg(struct bnxt *bp, int 
tx_rings)
req.flags = rte_cpu_to_le_32(bp->pf.func_cfg_flags);
req.mtu = rte_cpu_to_le_16(BNXT_MAX_MTU);
req.mru = rte_cpu_to_le_16(bp->eth_dev->data->mtu + ETHER_HDR_LEN +
-  ETHER_CRC_LEN + VLAN_TAG_SIZE);
+  ETHER_CRC_LEN + VLAN_TAG_SIZE *
+  BNXT_NUM_VLANS);
req.num_rsscos_ctxs = rte_cpu_to_le_16(bp->max_rsscos_ctx);
req.num_stat_ctxs = rte_cpu_to_le_16(bp->max_stat_ctx);
req.num_cmpl_rings = rte_cpu_to_le_16(bp->max_cp_rings);
@@ -2397,9 +2398,11 @@ static void populate_vf_func_cfg_req(struct bnxt *bp,
HWRM_FUNC_CFG_INPUT_ENABLES_NUM_HW_RING_GRPS);
 
req->mtu = rte_cpu_to_le_16(bp->eth_dev->data->mtu + ETHER_HDR_LEN +
-   ETHER_CRC_LEN + VLAN_TAG_SIZE);
+   ETHER_CRC_LEN + VLAN_TAG_SIZE *
+   BNXT_NUM_VLANS);
req->mru = rte_cpu_to_le_16(bp->eth_dev->data->mtu + ETHER_HDR_LEN +
-   ETHER_CRC_LEN + VLAN_TAG_SIZE);
+   ETHER_CRC_LEN + VLAN_TAG_SIZE *
+   BNXT_NUM_VLANS);
req->num_rsscos_ctxs = rte_cpu_to_le_16(bp->max_rsscos_ctx /
(num_vfs + 1));
req->num_stat_ctxs = rte_cpu_to_le_16(bp->max_stat_ctx / (num_vfs + 1));
-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 08/12] net/bnxt: return error if init is not complete before accessing stats

2018-04-30 Thread Ajit Khaparde

From: Jay Ding 

return error if init is not complete before accessing stats.

Fixes: ed2ced6fe927 ("net/bnxt: check initialization before accessing stats")
Cc: sta...@dpdk.org

Signed-off-by: Jay Ding 
Signed-off-by: Scott Branden 
Reviewed-by: Ajit Kumar Khaparde 
Reviewed-by: Randy Schacher 
Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_stats.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_stats.c b/drivers/net/bnxt/bnxt_stats.c
index 1b586f333..c1a8fad09 100644
--- a/drivers/net/bnxt/bnxt_stats.c
+++ b/drivers/net/bnxt/bnxt_stats.c
@@ -210,7 +210,7 @@ int bnxt_stats_get_op(struct rte_eth_dev *eth_dev,
memset(bnxt_stats, 0, sizeof(*bnxt_stats));
if (!(bp->flags & BNXT_FLAG_INIT_DONE)) {
PMD_DRV_LOG(ERR, "Device Initialization not complete!\n");
-   return 0;
+   return -1;
}
 
for (i = 0; i < bp->rx_cp_nr_rings; i++) {
-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 00/12] bnxt patchset

2018-04-30 Thread Ajit Khaparde

Patchset against dpdk-next-net.
Please apply.

Ajit Khaparde (3):
  net/bnxt: fix MTU calculation
  net/bnxt: fix to reset status of initialization
  net/bnxt: fix usage of vnic id

Jay Ding (2):
  net/bnxt: return EINVAL instead of ENOSPC on invalid max ring
  net/bnxt: return error if init is not complete before accessing stats

Qingmin Liu (2):
  net/bnxt: add support for lsc interrupt event
  net/bnxt: set MTU in dev config for jumbo packets

Rahul Gupta (1):
  net/bnxt: Validate structs and pointers before use

Randy Schacher (1):
  net/bnxt: clear HWRM sniffer list for PFs

Scott Branden (2):
  net/bnxt: rename function checking MAC address
  net/bnxt: rename driver version from Cumulus to NetXtreme

Xiaoxin Peng (1):
  net/bnxt: fix rx mbuf and agg ring leak in dev stop

 drivers/net/bnxt/bnxt.h|  1 +
 drivers/net/bnxt/bnxt_ethdev.c | 23 ---
 drivers/net/bnxt/bnxt_filter.c |  8 +---
 drivers/net/bnxt/bnxt_filter.h |  2 +-
 drivers/net/bnxt/bnxt_hwrm.c   | 26 +++---
 drivers/net/bnxt/bnxt_ring.c   |  3 +++
 drivers/net/bnxt/bnxt_rxq.c| 14 +++---
 drivers/net/bnxt/bnxt_stats.c  |  2 +-
 drivers/net/bnxt/bnxt_txq.c| 11 ++-
 9 files changed, 59 insertions(+), 31 deletions(-)

-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 10/12] net/bnxt: fix rx mbuf and agg ring leak in dev stop

2018-04-30 Thread Ajit Khaparde

From: Xiaoxin Peng 

In the start/stop_op operation, mbufs allocated for rings were not freed

1) add bnxt_free_tx_mbuf/bnxt_free_rx_mbuf in bnxt_dev_stop_op to free MBUF
   before freeing the rings.
2) MBUF allocation and free routines were not in sync. Allocation uses the
   ring->ring_size including any rounded up and multiple factors while the
   free routine uses the requested queue size.

Fixes: c09f57b49c13 ("net/bnxt: add start/stop/link update operations")
Cc: sta...@dpdk.org

Signed-off-by: Jay Ding 
Signed-off-by: Scott Branden 
Reviewed-by: Ray Jui 
Reviewed-by: Randy Schacher 
Signed-off-by: Xiaoxin Peng 
Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_ethdev.c | 4 ++--
 drivers/net/bnxt/bnxt_rxq.c| 6 --
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 352fc30b4..dc445f9a5 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -655,6 +655,8 @@ static void bnxt_dev_stop_op(struct rte_eth_dev *eth_dev)
}
bnxt_set_hwrm_link_config(bp, false);
bnxt_hwrm_port_clr_stats(bp);
+   bnxt_free_tx_mbufs(bp);
+   bnxt_free_rx_mbufs(bp);
bnxt_shutdown_nic(bp);
bp->dev_stopped = 1;
 }
@@ -666,8 +668,6 @@ static void bnxt_dev_close_op(struct rte_eth_dev *eth_dev)
if (bp->dev_stopped == 0)
bnxt_dev_stop_op(eth_dev);
 
-   bnxt_free_tx_mbufs(bp);
-   bnxt_free_rx_mbufs(bp);
bnxt_free_mem(bp);
if (eth_dev->data->mac_addrs != NULL) {
rte_free(eth_dev->data->mac_addrs);
diff --git a/drivers/net/bnxt/bnxt_rxq.c b/drivers/net/bnxt/bnxt_rxq.c
index 4b380d4f0..866fb56b1 100644
--- a/drivers/net/bnxt/bnxt_rxq.c
+++ b/drivers/net/bnxt/bnxt_rxq.c
@@ -207,7 +207,8 @@ static void bnxt_rx_queue_release_mbufs(struct 
bnxt_rx_queue *rxq)
if (rxq) {
sw_ring = rxq->rx_ring->rx_buf_ring;
if (sw_ring) {
-   for (i = 0; i < rxq->nb_rx_desc; i++) {
+   for (i = 0;
+i < rxq->rx_ring->rx_ring_struct->ring_size; i++) {
if (sw_ring[i].mbuf) {
rte_pktmbuf_free_seg(sw_ring[i].mbuf);
sw_ring[i].mbuf = NULL;
@@ -217,7 +218,8 @@ static void bnxt_rx_queue_release_mbufs(struct 
bnxt_rx_queue *rxq)
/* Free up mbufs in Agg ring */
sw_ring = rxq->rx_ring->ag_buf_ring;
if (sw_ring) {
-   for (i = 0; i < rxq->nb_rx_desc; i++) {
+   for (i = 0;
+i < rxq->rx_ring->ag_ring_struct->ring_size; i++) {
if (sw_ring[i].mbuf) {
rte_pktmbuf_free_seg(sw_ring[i].mbuf);
sw_ring[i].mbuf = NULL;
-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 09/12] net/bnxt: fix to reset status of initialization

2018-04-30 Thread Ajit Khaparde

clear flag on stop at proper location to avoid race conditions.

Fixes: ed2ced6fe927 ("net/bnxt: check initialization before accessing stats")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 20ed0a31f..352fc30b4 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -648,13 +648,13 @@ static void bnxt_dev_stop_op(struct rte_eth_dev *eth_dev)
 {
struct bnxt *bp = (struct bnxt *)eth_dev->data->dev_private;
 
+   bp->flags &= ~BNXT_FLAG_INIT_DONE;
if (bp->eth_dev->data->dev_started) {
/* TBD: STOP HW queues DMA */
eth_dev->data->dev_link.link_status = 0;
}
bnxt_set_hwrm_link_config(bp, false);
bnxt_hwrm_port_clr_stats(bp);
-   bp->flags &= ~BNXT_FLAG_INIT_DONE;
bnxt_shutdown_nic(bp);
bp->dev_stopped = 1;
 }
-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 12/12] net/bnxt: clear HWRM sniffer list for PFs

2018-04-30 Thread Ajit Khaparde

From: Randy Schacher 

Clear HWRM sniffer list for DPDK PFs so that VFs on
DPDK PFs initialize successfully. DPDK PF driver does not
handle HWRM commands from VFs.

Signed-off-by: Randy Schacher 
Signed-off-by: Scott Branden 
Reviewed-by: Ajit Kumar Khaparde 
---
 drivers/net/bnxt/bnxt_hwrm.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index d3c50e490..5b9840d4f 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -611,6 +611,15 @@ int bnxt_hwrm_func_driver_register(struct bnxt *bp)
memcpy(req.vf_req_fwd, bp->pf.vf_req_fwd,
   RTE_MIN(sizeof(req.vf_req_fwd),
   sizeof(bp->pf.vf_req_fwd)));
+
+   /*
+* PF can sniff HWRM API issued by VF. This can be set up by
+* linux driver and inherited by the DPDK PF driver. Clear
+* this HWRM sniffer list in FW because DPDK PF driver does
+* not support this.
+*/
+   req.flags =
+   rte_cpu_to_le_32(HWRM_FUNC_DRV_RGTR_INPUT_FLAGS_FWD_NONE_MODE);
}
 
req.async_event_fwd[0] |=
-- 
2.15.1 (Apple Git-101)

[dpdk-dev] [PATCH 11/12] net/bnxt: fix usage of vnic id

2018-04-30 Thread Ajit Khaparde

VNIC ID returned by the FW is a 16-bit field.
We are incorrectly using it as a 32-bit value in few places.
This patch corrects that.

Fixes: daef48efe5e5 ("net/bnxt: support set MTU")
Cc: sta...@dpdk.org

Signed-off-by: Ajit Khaparde 
Signed-off-by: Scott Branden 
Reviewed-by: Michael Wildt 
Reviewed-by: Randy Schacher 
Reviewed-by: Ray Jui 
---
 drivers/net/bnxt/bnxt_hwrm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index c136edc06..d3c50e490 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -1212,7 +1212,7 @@ static int bnxt_hwrm_vnic_plcmodes_qcfg(struct bnxt *bp,
 
HWRM_PREP(req, VNIC_PLCMODES_QCFG);
 
-   req.vnic_id = rte_cpu_to_le_32(vnic->fw_vnic_id);
+   req.vnic_id = rte_cpu_to_le_16(vnic->fw_vnic_id);
 
rc = bnxt_hwrm_send_message(bp, &req, sizeof(req));
 
@@ -1240,7 +1240,7 @@ static int bnxt_hwrm_vnic_plcmodes_cfg(struct bnxt *bp,
 
HWRM_PREP(req, VNIC_PLCMODES_CFG);
 
-   req.vnic_id = rte_cpu_to_le_32(vnic->fw_vnic_id);
+   req.vnic_id = rte_cpu_to_le_16(vnic->fw_vnic_id);
req.flags = rte_cpu_to_le_32(pmode->flags);
req.jumbo_thresh = rte_cpu_to_le_16(pmode->jumbo_thresh);
req.hds_offset = rte_cpu_to_le_16(pmode->hds_offset);
@@ -1484,7 +1484,7 @@ int bnxt_hwrm_vnic_plcmode_cfg(struct bnxt *bp,
size -= RTE_PKTMBUF_HEADROOM;
 
req.jumbo_thresh = rte_cpu_to_le_16(size);
-   req.vnic_id = rte_cpu_to_le_32(vnic->fw_vnic_id);
+   req.vnic_id = rte_cpu_to_le_16(vnic->fw_vnic_id);
 
rc = bnxt_hwrm_send_message(bp, &req, sizeof(req));
 
@@ -1520,7 +1520,7 @@ int bnxt_hwrm_vnic_tpa_cfg(struct bnxt *bp,
rte_cpu_to_le_16(HWRM_VNIC_TPA_CFG_INPUT_MAX_AGGS_MAX);
req.min_agg_len = rte_cpu_to_le_32(512);
}
-   req.vnic_id = rte_cpu_to_le_32(vnic->fw_vnic_id);
+   req.vnic_id = rte_cpu_to_le_16(vnic->fw_vnic_id);
 
rc = bnxt_hwrm_send_message(bp, &req, sizeof(req));
 
-- 
2.15.1 (Apple Git-101)

Re: [dpdk-dev] [PATCH 5/8 v4] raw/dpaa2_qdma: introduce the DPAA2 QDMA driver

2018-04-30 Thread Nipun Gupta



> -Original Message-
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Monday, April 30, 2018 6:05 PM
> To: Nipun Gupta 
> Cc: dev@dpdk.org; Shreyansh Jain ; Hemant
> Agrawal 
> Subject: Re: [dpdk-dev] [PATCH 5/8 v4] raw/dpaa2_qdma: introduce the
> DPAA2 QDMA driver
> 
> 24/04/2018 13:49, Nipun Gupta:
> >  drivers/raw/dpaa2_qdma/dpaa2_qdma.c| 294
> +
> >  drivers/raw/dpaa2_qdma/dpaa2_qdma.h|  66 +
> >  drivers/raw/dpaa2_qdma/dpaa2_qdma_logs.h   |  46 
> [...]
> > +install_headers('rte_pmd_dpaa2_qdma.h')
> 
> I think you need to rename the exported header file with rte_pmd_ prefix.

Sorry, I did not get it. Filename is already with rte_pmd_ prefix.

Thanks,
Nipun

>

Re: [dpdk-dev] [PATCH 5/8 v4] raw/dpaa2_qdma: introduce the DPAA2 QDMA driver

2018-04-30 Thread Nipun Gupta



> -Original Message-
> From: Nipun Gupta
> Sent: Tuesday, May 1, 2018 11:44 AM
> To: 'Thomas Monjalon' 
> Cc: dev@dpdk.org; Shreyansh Jain ; Hemant
> Agrawal 
> Subject: RE: [dpdk-dev] [PATCH 5/8 v4] raw/dpaa2_qdma: introduce the
> DPAA2 QDMA driver
> 
> 
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > Sent: Monday, April 30, 2018 6:05 PM
> > To: Nipun Gupta 
> > Cc: dev@dpdk.org; Shreyansh Jain ; Hemant
> > Agrawal 
> > Subject: Re: [dpdk-dev] [PATCH 5/8 v4] raw/dpaa2_qdma: introduce the
> > DPAA2 QDMA driver
> >
> > 24/04/2018 13:49, Nipun Gupta:
> > >  drivers/raw/dpaa2_qdma/dpaa2_qdma.c| 294
> > +
> > >  drivers/raw/dpaa2_qdma/dpaa2_qdma.h|  66 +
> > >  drivers/raw/dpaa2_qdma/dpaa2_qdma_logs.h   |  46 
> > [...]
> > > +install_headers('rte_pmd_dpaa2_qdma.h')
> >
> > I think you need to rename the exported header file with rte_pmd_ prefix.

Got it. This should be in the next patch where 'rte_pmd_dpaa2_qdma.h' file has 
been introduced.
I will fix it and respin a version.

> 
> Sorry, I did not get it. Filename is already with rte_pmd_ prefix.
> 
> Thanks,
> Nipun
> 
> >

73 matches

Mail list logo