Re: [dpdk-dev] [PATCH 02/20] crypto/cnxk: add probe and remove

2021-06-16 Thread Akhil Goyal
> +/*
> + * DP logs, toggled out at compile time if level lower than current level.
> + * DP logs would be logged under 'PMD' type. So for dynamic logging, the
> + * level of 'pmd' has to be used.
> + */
> +#define CPT_LOG_DP(level, fmt, args...) RTE_LOG_DP(level, PMD, fmt "\n",
> ##args)
> +
> +#define CPT_LOG_DP_DEBUG(fmt, args...) CPT_LOG_DP(DEBUG, fmt,
> ##args)
> +#define CPT_LOG_DP_INFO(fmt, args...)  CPT_LOG_DP(INFO, fmt, ##args)
> +#define CPT_LOG_DP_WARN(fmt, args...)  CPT_LOG_DP(WARNING, fmt,
> ##args)
> +#define CPT_LOG_DP_ERR(fmt, args...)   CPT_LOG_DP(ERR, fmt, ##args)
> +
There are two types of formatting for logging used in this PMD.
Can you make it common.
I believe these can be moved to common/cnxk/ and have plt_cpt_dp_log()
Or something like that.




Re: [dpdk-dev] [PATCH 05/20] crypto/cnxk: add queue pair ops

2021-06-16 Thread Akhil Goyal
> diff --git a/drivers/crypto/cnxk/cnxk_cpt_ops_helper.c
> b/drivers/crypto/cnxk/cnxk_cpt_ops_helper.c
> new file mode 100644
> index 000..103195e
> --- /dev/null
> +++ b/drivers/crypto/cnxk/cnxk_cpt_ops_helper.c
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2021 Marvell.
> + */
> +
> +#include 
> +
> +#include "hw/cpt.h"
> +#include "roc_api.h"
> +
> +#include "cnxk_cpt_ops_helper.h"
> +
> +int
> +cnxk_cpt_ops_helper_get_mlen(void)
> +{
> + uint32_t len;
> +
> + /* For MAC */
> + len = 2 * sizeof(uint64_t);
> + len += ROC_SE_MAX_MAC_LEN * sizeof(uint8_t);
> +
> + len += CPT_OFFSET_CONTROL_BYTES + CPT_MAX_IV_LEN;
> + len += RTE_ALIGN_CEIL((ROC_SE_SG_LIST_HDR_SIZE +
> +
> (RTE_ALIGN_CEIL(ROC_SE_MAX_SG_IN_OUT_CNT, 4) >>
> + 2) * SG_ENTRY_SIZE),
> +   8);
> +
> + return len;
> +}
> diff --git a/drivers/crypto/cnxk/cnxk_cpt_ops_helper.h
> b/drivers/crypto/cnxk/cnxk_cpt_ops_helper.h
> new file mode 100644
> index 000..23c6fed
> --- /dev/null
> +++ b/drivers/crypto/cnxk/cnxk_cpt_ops_helper.h
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2021 Marvell.
> + */
> +
> +#ifndef _CNXK_CPT_OPS_HELPER_H_
> +#define _CNXK_CPT_OPS_HELPER_H_
> +
> +#define CPT_MAX_IV_LEN16
> +#define CPT_OFFSET_CONTROL_BYTES 8
> +#define SG_ENTRY_SIZE sizeof(struct roc_se_sglist_comp)
> +
> +/*
> + * Get size of contiguous meta buffer to be allocated
> + *
> + * @return
> + *   - length
> + */
> +int cnxk_cpt_ops_helper_get_mlen(void);
> +
> +#endif /* _CNXK_CPT_OPS_HELPER_H_ */

Why do we need these separate helper files. It has only one function and few
Macros which can be easily moved to drivers/crypto/cnxk/cnxk_cryptodev_ops.c/.h




Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-16 Thread Burakov, Anatoly

On 14-Jun-21 11:58 AM, Thomas Monjalon wrote:

Performance of access in a fixed-size array is very good
because of cache locality
and because there is a single pointer to dereference.
The only drawback is the lack of flexibility:
the size of such an array cannot be increase at runtime.

An approach to this problem is to allocate the array at runtime,
being as efficient as static arrays, but still limited to a maximum.

That's why the API rte_parray is introduced,
allowing to declare an array of pointer which can be resized dynamically
and automatically at runtime while keeping a good read performance.

After resize, the previous array is kept until the next resize
to avoid crashs during a read without any lock.

Each element is a pointer to a memory chunk dynamically allocated.
This is not good for cache locality but it allows to keep the same
memory per element, no matter how the array is resized.
Cache locality could be improved with mempools.
The other drawback is having to dereference one more pointer
to read an element.

There is not much locks, so the API is for internal use only.
This API may be used to completely remove some compilation-time maximums.

Signed-off-by: Thomas Monjalon 
---





+int32_t
+rte_parray_find_next(struct rte_parray *obj, int32_t index)
+{
+   if (obj == NULL || index < 0) {
+   rte_errno = EINVAL;
+   return -1;
+   }
+
+   pthread_mutex_lock(&obj->mutex);
+
+   while (index < obj->size && obj->array[index] == NULL)
+   index++;
+   if (index >= obj->size)
+   index = -1;
+
+   pthread_mutex_unlock(&obj->mutex);
+
+   rte_errno = 0;
+   return index;
+}
+


Just a general comment about this:

I'm not really sure i like this "kinda-sorta-threadsafe-but-not-really" 
approach. IMO something either should be thread-safe, or it should be 
explicitly not thread-safe. There's no point in locking here because any 
user of find_next() will *necessarily* race with other users, because by 
the time we exit the function, the result becomes stale - so why are we 
locking in the first place?


Would be perhaps be better to leave it as non-thread-safe at its core, 
but introduce wrappers for atomic-like access to the array? E.g. 
something like `rte_parray_find_next_free_and_set()` that will perform 
the lock-find-next-set-unlock sequence? Or, alternatively, have the 
mutex there, but provide API's for explicit locking, and put the burden 
on the user to actually do the locking correctly.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-16 Thread Morten Brørup
> From: Jerin Jacob [mailto:jerinjac...@gmail.com]
> Sent: Wednesday, 16 June 2021 11.42
> 
> On Tue, Jun 15, 2021 at 12:18 PM Thomas Monjalon 
> wrote:
> >
> > 14/06/2021 17:48, Morten Brørup:
> > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> Monjalon
> > > It would be much simpler to just increase RTE_MAX_ETHPORTS to
> something big enough to hold a sufficiently large array. And possibly
> add an rte_max_ethports variable to indicate the number of populated
> entries in the array, for use when iterating over the array.
> > >
> > > Can we come up with another example than RTE_MAX_ETHPORTS where
> this library provides a better benefit?
> >
> > What is big enough?
> > Is 640KB enough for RAM? ;)
> 
> If I understand it correctly, Linux process allocates 640KB due to
> that fact currently
> struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] is global and it
> is from BSS.

Correct.

> If we make this from heap i.e use malloc() to allocate this memory
> then in my understanding Linux
> really won't allocate the real page for backend memory until unless,
> someone write/read to this memory.

If the array is allocated from the heap, its members will be accessed though a 
pointer to the array, e.g. in rte_eth_rx/tx_burst(). This might affect 
performance, which is probably why the array is allocated the way it is.

Although it might be worth investigating how much it actually affects the 
performance.

So we need to do something else if we want to conserve memory and still allow a 
large rte_eth_devices[] array.

Looking at struct rte_eth_dev, we could reduce its size as follows:

1. Change the two callback arrays 
post_rx/pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT] to pointers to callback 
arrays, which are allocated from the heap.
With the default RTE_MAX_QUEUES_PER_PORT of 1024, these two arrays are the 
sinners that make the struct rte_eth_dev use so much memory. This modification 
would save 16 KB (minus 16 bytes for the pointers to the two arrays) per port.
Furthermore, these callback arrays would only need to be allocated if the 
application is compiled with callbacks enabled (#define 
RTE_ETHDEV_RXTX_CALLBACKS). And they would only need to be sized to the actual 
number of queues for the port.

The disadvantage is that this would add another level of indirection, although 
only for applications compiled with callbacks enabled.

2. Remove reserved_64s[4] and reserved_ptrs[4]. This would save 64 bytes per 
port. Not much, but worth considering if we are changing the API/ABI anyway.




Re: [dpdk-dev] [RFC v2] porting AddressSanitizer feature to DPDK

2021-06-16 Thread Jerin Jacob
On Wed, Jun 16, 2021 at 2:43 PM Lin, Xueqin  wrote:
>
> > -Original Message-
> > From: Jerin Jacob 
> > Sent: Tuesday, June 15, 2021 4:40 PM
> > To: Peng, ZhihongX 
> > Cc: Burakov, Anatoly ; Ananyev, Konstantin
> > ; Stephen Hemminger
> > ; dpdk-dev ; Lin, Xueqin
> > 
> > Subject: Re: [dpdk-dev] [RFC v2] porting AddressSanitizer feature to DPDK
> >
> > On Tue, Jun 15, 2021 at 1:46 PM  wrote:
> > >
> > > From: Zhihong Peng 
> > >
> > > AddressSanitizer (ASan) is a google memory error detect standard tool.
> > > It could help to detect use-after-free and {heap,stack,global}-buffer
> > > overflow bugs in C/C++ programs, print detailed error information when
> > > error happens, large improve debug efficiency.
> > >
> > > By referring to its implementation algorithm
> > > (https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm),
> > > ported heap-buffer-overflow and use-after-freefunctions to dpdk.
> > >
> > > Here is an example of heap-buffer-overflow bug:
> > > ..
> > > char *p = rte_zmalloc(NULL, 7, 0);
> > > p[7] = 'a';
> > > ..
> > >
> > > Here is an example of use-after-free bug:
> > > ..
> > > char *p = rte_zmalloc(NULL, 7, 0);
> > > rte_free(p);
> > > *p = 'a';
> > > ..
> > >
> > > If you want to use this feature,
> > > you need to use the following compilation options:
> > > -Db_lundef=false -Db_sanitize=address
> >
> > # Thanks for this patch. It is a useful item.
> >
> > # Subject could be changed
> > from:
> > porting AddressSanitizer feature to DPDK to
> > eal: support for  AddressSanitizer
> > or so
>
> Thanks for your positive feedback and review.
> Good point, we will update the title in next version.
>
> >
> > # Could you add a section in the documentation for Sanitizers to document 
> > the
> > build time option and other points that users need to know.
>
> Make sense to add build option and key points to document, will add this part 
> in doc
folder.
>
> > We can add other sanitizers such as UBSan etc in the future here
> WIP to research other sanitizer tool.

UBsan is a good candate.
Some old DPDK patch for the same:
http://patches.dpdk.org/project/dpdk/patch/1573832013-18946-1-git-send-email-hka...@marvell.com/

> >
> > # Add a UT test case to make sure it is working in app/test or so.
>
> This tool could help to detect memory issue, need to change bad code to check 
> if working.

It is better to have a  UT to test things are working. You could add it in
app/test/test_address_sanity.c. Tests can be such that
- Skip if not complied with Sanity enabled
- Pass if the code detects the known bad code. You can have test cases
with pubic rte_ API
that internally exercise the verify your implementation related to
new asan_* APIs.


> Suggest listing demo code and tool capture information for user to try if 
> tool works, also add this part into doc.
>
> >
> > # Also, Please update the release note for this feature.
> Sure, we can update the release note if code merge.

Probably you can send v1 version next i.e change the RFC status to get merged.


>


Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-16 Thread Jerin Jacob
On Wed, Jun 16, 2021 at 4:57 PM Morten Brørup  
wrote:
>
> > From: Jerin Jacob [mailto:jerinjac...@gmail.com]
> > Sent: Wednesday, 16 June 2021 11.42
> >
> > On Tue, Jun 15, 2021 at 12:18 PM Thomas Monjalon 
> > wrote:
> > >
> > > 14/06/2021 17:48, Morten Brørup:
> > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> > Monjalon
> > > > It would be much simpler to just increase RTE_MAX_ETHPORTS to
> > something big enough to hold a sufficiently large array. And possibly
> > add an rte_max_ethports variable to indicate the number of populated
> > entries in the array, for use when iterating over the array.
> > > >
> > > > Can we come up with another example than RTE_MAX_ETHPORTS where
> > this library provides a better benefit?
> > >
> > > What is big enough?
> > > Is 640KB enough for RAM? ;)
> >
> > If I understand it correctly, Linux process allocates 640KB due to
> > that fact currently
> > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] is global and it
> > is from BSS.
>
> Correct.
>
> > If we make this from heap i.e use malloc() to allocate this memory
> > then in my understanding Linux
> > really won't allocate the real page for backend memory until unless,
> > someone write/read to this memory.
>
> If the array is allocated from the heap, its members will be accessed though 
> a pointer to the array, e.g. in rte_eth_rx/tx_burst(). This might affect 
> performance, which is probably why the array is allocated the way it is.
>
> Although it might be worth investigating how much it actually affects the 
> performance.

it should not. From CPU and compiler PoV it is same.
if see cryptodev, it is using following

static struct rte_cryptodev rte_crypto_devices[RTE_CRYPTO_MAX_DEVS];
struct rte_cryptodev *rte_cryptodevs = rte_crypto_devices;

And accessing  rte_cryptodevs[].

Also, this structure is not cache aligned. Probably need to fix it.


> So we need to do something else if we want to conserve memory and still allow 
> a large rte_eth_devices[] array.
>
> Looking at struct rte_eth_dev, we could reduce its size as follows:
>
> 1. Change the two callback arrays 
> post_rx/pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT] to pointers to callback 
> arrays, which are allocated from the heap.
> With the default RTE_MAX_QUEUES_PER_PORT of 1024, these two arrays are the 
> sinners that make the struct rte_eth_dev use so much memory. This 
> modification would save 16 KB (minus 16 bytes for the pointers to the two 
> arrays) per port.
> Furthermore, these callback arrays would only need to be allocated if the 
> application is compiled with callbacks enabled (#define 
> RTE_ETHDEV_RXTX_CALLBACKS). And they would only need to be sized to the 
> actual number of queues for the port.
>
> The disadvantage is that this would add another level of indirection, 
> although only for applications compiled with callbacks enabled.

I think, we don't need one more indirection if all allocated from the
heap. as memory is not wasted if not touched by CPU.

>
> 2. Remove reserved_64s[4] and reserved_ptrs[4]. This would save 64 bytes per 
> port. Not much, but worth considering if we are changing the API/ABI anyway.
>
>


Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of fengchengwen
> Sent: Wednesday, 16 June 2021 12.17
> 
> On 2021/6/16 15:09, Morten Brørup wrote:
> > I would like to share a couple of ideas for further discussion:
> >
> > 1. API for bulk operations.
> > The ability to prepare a vector of DMA operations, and then post it
> to the DMA driver.
> 
> We consider bulk operation and final decide not to support:
> 1. The DMA engine don't applicable to small-packet scenarios which have
> high PPS.
>PS: The vector is suitable for high PPS.
> 2. To support post bulk ops, we need define standard struct like
> rte_mbuf, and
>application may nned init the struct field and pass them as pointer
> array,
>this may cost too much CPU.
> 3. The post request was simple than process completed operations, The
> CPU write
>performance is also good. ---driver could use vectors to accelerate
> the process
>of completed operations.

OK. Thank you for elaborating.

> >
> > 2. Prepare the API for more complex DMA operations than just
> copy/fill.
> > E.g. blitter operations like "copy A bytes from the source starting
> at address X, to the destination starting at address Y, masked with the
> bytes starting at address Z, then skip B bytes at the source and C
> bytes at the destination, rewind the mask to the beginning of Z, and
> repeat D times". This is just an example.
> > I'm suggesting to use a "DMA operation" union structure as parameter
> to the command enqueue function, rather than having individual
> functions for each possible DMA operation.
> 
> There are many sisution which may hard to define such structure, I
> prefer separates API like copy/fill/...
> PS: I saw struct dma_device (Linux dmaengine.h) also support various
> prep_xxx API.

OK. Separate functions make sense if the DMA driver does not support a large 
variety of operations, but only copy and fill.



Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread David Marchand
On Tue, Jun 15, 2021 at 3:25 PM Chengwen Feng  wrote:
> +
> +#define RTE_DMADEV_NAME_MAX_LEN(64)
> +/**< @internal Max length of name of DMA PMD */
> +
> +/** @internal
> + * The data structure associated with each DMA device.
> + */
> +struct rte_dmadev {
> +   /**< Device ID for this instance */
> +   uint16_t dev_id;
> +   /**< Functions exported by PMD */
> +   const struct rte_dmadev_ops *dev_ops;
> +   /**< Device info. supplied during device initialization */
> +   struct rte_device *device;
> +   /**< Driver info. supplied by probing */
> +   const char *driver_name;
> +
> +   /**< Device name */
> +   char name[RTE_DMADEV_NAME_MAX_LEN];
> +} __rte_cache_aligned;
> +

I see no queue/channel notion.
How does a rte_dmadev object relate to a physical hw engine?


-- 
David Marchand



Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-16 Thread Burakov, Anatoly

On 16-Jun-21 10:42 AM, Jerin Jacob wrote:

On Tue, Jun 15, 2021 at 12:18 PM Thomas Monjalon  wrote:


14/06/2021 17:48, Morten Brørup:

From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon

It would be much simpler to just increase RTE_MAX_ETHPORTS to something big 
enough to hold a sufficiently large array. And possibly add an rte_max_ethports 
variable to indicate the number of populated entries in the array, for use when 
iterating over the array.

Can we come up with another example than RTE_MAX_ETHPORTS where this library 
provides a better benefit?


What is big enough?
Is 640KB enough for RAM? ;)


If I understand it correctly, Linux process allocates 640KB due to
that fact currently
struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] is global and it
is from BSS.

If we make this from heap i.e use malloc() to allocate this memory
then in my understanding Linux
really won't allocate the real page for backend memory until unless,
someone write/read to this memory.

i.e it will be free virtual memory using Linux memory management help.
If so, we can keep large values for RTE_MAX_ETHPORTS
without wasting any "real" memory even though the system has a few ports.

Thoughts?



mmap works this way with anonymous memory, i'm not so sure about 
malloc()'ed memory. Plus, we can't base these decisions on what Linux 
does because we support other OS's. Do they do this as well?


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH 1/3] net/virtio: keep device and frontend features separated

2021-06-16 Thread Xia, Chenbo
> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, June 8, 2021 10:14 PM
> To: dev@dpdk.org; Xia, Chenbo ; amore...@redhat.com;
> david.march...@redhat.com
> Cc: Maxime Coquelin 
> Subject: [PATCH 1/3] net/virtio: keep device and frontend features separated
> 
> This patch is preliminary rework to add support for getting
> and setting device's config space.
> 
> In order to get or set a device config such as its MAC address,
> we need to know whether the device itself support the feature,
> or if it is emulated by the frontend.
> 
> Signed-off-by: Maxime Coquelin 
> ---
>  drivers/net/virtio/virtio_user/virtio_user_dev.c | 10 ++
>  drivers/net/virtio/virtio_user_ethdev.c  |  5 +++--
>  2 files changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> index 364f43e21c..ed55cd7524 100644
> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> @@ -573,11 +573,7 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char
> *path, int queues,
>   if (dev->backend_type == VIRTIO_USER_BACKEND_VHOST_USER)
>   dev->frontend_features |= (1ull << VIRTIO_NET_F_STATUS);
> 
> - /*
> -  * Device features =
> -  * (frontend_features | backend_features) & ~unsupported_features;
> -  */
> - dev->device_features |= dev->frontend_features;
> + dev->frontend_features &= ~dev->unsupported_features;
>   dev->device_features &= ~dev->unsupported_features;
> 
>   if (rte_mem_event_callback_register(VIRTIO_USER_MEM_EVENT_CLB_NAME,
> @@ -980,12 +976,10 @@ virtio_user_dev_server_reconnect(struct virtio_user_dev
> *dev)
>   return -1;
>   }
> 
> - dev->device_features |= dev->frontend_features;
> -
>   /* unmask vhost-user unsupported features */
>   dev->device_features &= ~(dev->unsupported_features);
> 
> - dev->features &= dev->device_features;
> + dev->features &= (dev->device_features | dev->frontend_features);
> 
>   /* For packed ring, resetting queues is required in reconnection. */
>   if (virtio_with_packed_queue(hw) &&
> diff --git a/drivers/net/virtio/virtio_user_ethdev.c
> b/drivers/net/virtio/virtio_user_ethdev.c
> index e85906e9eb..3ecbb4184a 100644
> --- a/drivers/net/virtio/virtio_user_ethdev.c
> +++ b/drivers/net/virtio/virtio_user_ethdev.c
> @@ -110,7 +110,8 @@ virtio_user_get_features(struct virtio_hw *hw)
>   struct virtio_user_dev *dev = virtio_user_get_dev(hw);
> 
>   /* unmask feature bits defined in vhost user protocol */
> - return dev->device_features & VIRTIO_PMD_SUPPORTED_GUEST_FEATURES;
> + return (dev->device_features | dev->frontend_features) &
> + VIRTIO_PMD_SUPPORTED_GUEST_FEATURES;
>  }
> 
>  static void
> @@ -118,7 +119,7 @@ virtio_user_set_features(struct virtio_hw *hw, uint64_t
> features)
>  {
>   struct virtio_user_dev *dev = virtio_user_get_dev(hw);
> 
> - dev->features = features & dev->device_features;
> + dev->features = features & (dev->device_features | dev-
> >frontend_features);
>  }
> 
>  static int
> --
> 2.31.1

Reviewed-by: Chenbo Xia 


Re: [dpdk-dev] [PATCH 2/3] net/virtio: add device config support to vDPA

2021-06-16 Thread Xia, Chenbo
Hi Maxime,

> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, June 8, 2021 10:14 PM
> To: dev@dpdk.org; Xia, Chenbo ; amore...@redhat.com;
> david.march...@redhat.com
> Cc: Maxime Coquelin 
> Subject: [PATCH 2/3] net/virtio: add device config support to vDPA
> 
> This patch introduces two virtio-user callbacks to get
> and set device's config, and implements it for vDPA
> backends.
> 
> Signed-off-by: Maxime Coquelin 
> ---
>  drivers/net/virtio/virtio_user/vhost.h  |  3 +
>  drivers/net/virtio/virtio_user/vhost_vdpa.c | 69 +
>  2 files changed, 72 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_user/vhost.h
> b/drivers/net/virtio/virtio_user/vhost.h
> index c49e88036d..dfbf6be033 100644
> --- a/drivers/net/virtio/virtio_user/vhost.h
> +++ b/drivers/net/virtio/virtio_user/vhost.h
> @@ -79,6 +79,9 @@ struct virtio_user_backend_ops {
>   int (*set_vring_addr)(struct virtio_user_dev *dev, struct
> vhost_vring_addr *addr);
>   int (*get_status)(struct virtio_user_dev *dev, uint8_t *status);
>   int (*set_status)(struct virtio_user_dev *dev, uint8_t status);
> + int (*get_config)(struct virtio_user_dev *dev, uint8_t *data, uint32_t
> off, uint32_t len);
> + int (*set_config)(struct virtio_user_dev *dev, const uint8_t *data,
> uint32_t off,
> + uint32_t len);
>   int (*enable_qp)(struct virtio_user_dev *dev, uint16_t pair_idx, int
> enable);
>   int (*dma_map)(struct virtio_user_dev *dev, void *addr, uint64_t iova,
> size_t len);
>   int (*dma_unmap)(struct virtio_user_dev *dev, void *addr, uint64_t iova,
> size_t len);
> diff --git a/drivers/net/virtio/virtio_user/vhost_vdpa.c
> b/drivers/net/virtio/virtio_user/vhost_vdpa.c
> index e2d6d3504d..59bc712d48 100644
> --- a/drivers/net/virtio/virtio_user/vhost_vdpa.c
> +++ b/drivers/net/virtio/virtio_user/vhost_vdpa.c
> @@ -41,6 +41,8 @@ struct vhost_vdpa_data {
>  #define VHOST_VDPA_GET_DEVICE_ID _IOR(VHOST_VIRTIO, 0x70, __u32)
>  #define VHOST_VDPA_GET_STATUS _IOR(VHOST_VIRTIO, 0x71, __u8)
>  #define VHOST_VDPA_SET_STATUS _IOW(VHOST_VIRTIO, 0x72, __u8)
> +#define VHOST_VDPA_GET_CONFIG _IOR(VHOST_VIRTIO, 0x73, struct
> vhost_vdpa_config)
> +#define VHOST_VDPA_SET_CONFIG _IOW(VHOST_VIRTIO, 0x74, struct
> vhost_vdpa_config)
>  #define VHOST_VDPA_SET_VRING_ENABLE _IOW(VHOST_VIRTIO, 0x75, struct
> vhost_vring_state)
>  #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
>  #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
> @@ -65,6 +67,12 @@ struct vhost_iotlb_msg {
> 
>  #define VHOST_IOTLB_MSG_V2 0x2
> 
> +struct vhost_vdpa_config {
> + uint32_t off;
> + uint32_t len;
> + uint8_t buf[0];
> +};
> +
>  struct vhost_msg {
>   uint32_t type;
>   uint32_t reserved;
> @@ -440,6 +448,65 @@ vhost_vdpa_set_status(struct virtio_user_dev *dev,
> uint8_t status)
>   return vhost_vdpa_ioctl(data->vhostfd, VHOST_VDPA_SET_STATUS, &status);
>  }
> 
> +static int
> +vhost_vdpa_get_config(struct virtio_user_dev *dev, uint8_t *data, uint32_t
> off, uint32_t len)
> +{
> + struct vhost_vdpa_data *vdpa_data = dev->backend_data;
> + struct vhost_vdpa_config *config;
> + int ret = 0;
> +
> + config = malloc(sizeof(*config) + len);
> + if (!config) {
> + PMD_DRV_LOG(ERR, "Failed to allocate vDPA config data\n");

No need to add '\n'. And same for below three 'PMD_DRV_LOG'

> + return -1;
> + }
> +
> + config->off = off;
> + config->len = len;
> +
> + ret = vhost_vdpa_ioctl(vdpa_data->vhostfd, VHOST_VDPA_GET_CONFIG,
> config);
> + if (ret) {
> + PMD_DRV_LOG(ERR, "Failed to get vDPA config (offset %x, len 
> %x)\n",

Better add '0x' here as it will be friendly to user 😊

> off, len);
> + ret = -1;
> + goto out;
> + }
> +
> + memcpy(data, config->buf, len);
> +out:
> + free(config);
> +
> + return ret;
> +}
> +
> +static int
> +vhost_vdpa_set_config(struct virtio_user_dev *dev, const uint8_t *data,
> uint32_t off, uint32_t len)
> +{
> + struct vhost_vdpa_data *vdpa_data = dev->backend_data;
> + struct vhost_vdpa_config *config;
> + int ret = 0;
> +
> + config = malloc(sizeof(*config) + len);
> + if (!config) {
> + PMD_DRV_LOG(ERR, "Failed to allocate vDPA config data\n");
> + return -1;
> + }
> +
> + config->off = off;
> + config->len = len;
> +
> + memcpy(config->buf, data, len);
> +
> + ret = vhost_vdpa_ioctl(vdpa_data->vhostfd, VHOST_VDPA_SET_CONFIG,
> config);
> + if (ret) {
> + PMD_DRV_LOG(ERR, "Failed to set vDPA config (offset %x, len 
> %x)\n",
> off, len);

Ditto

Thanks,
Chenbo

> + ret = -1;
> + }
> +
> + free(config);
> +
> + return ret;
> +}
> +
>  /**
>   * Set up environment to talk with a vhost vdpa backend.
>   *
> @@ -559,6 +626,8 @@ struct virtio_user_backend_ops virtio_ops_vdpa = {
>   .set_vri

Re: [dpdk-dev] [PATCH 3/3] net/virtio: add MAC device config getter and setter

2021-06-16 Thread Xia, Chenbo
Hi Maxime,

> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, June 8, 2021 10:14 PM
> To: dev@dpdk.org; Xia, Chenbo ; amore...@redhat.com;
> david.march...@redhat.com
> Cc: Maxime Coquelin 
> Subject: [PATCH 3/3] net/virtio: add MAC device config getter and setter
> 
> This patch uses the new device config ops to get and set
> the MAC address if supported.
> 
> If a valid MAC address is passed as devarg of the
> Virtio-user PMD, the driver will try to store it in the
> device config space. Otherwise the one provided in
> the device config space will be used, if available.
> 
> Signed-off-by: Maxime Coquelin 
> ---
>  .../net/virtio/virtio_user/virtio_user_dev.c  | 85 ---
>  .../net/virtio/virtio_user/virtio_user_dev.h  |  2 +
>  drivers/net/virtio/virtio_user_ethdev.c   |  7 +-
>  3 files changed, 81 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> index ed55cd7524..5c9f142024 100644
> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> @@ -260,21 +260,84 @@ int virtio_user_stop_device(struct virtio_user_dev *dev)
>   return -1;
>  }
> 
> -static inline void
> -parse_mac(struct virtio_user_dev *dev, const char *mac)
> +int
> +virtio_user_dev_set_mac(struct virtio_user_dev *dev)
>  {
> - struct rte_ether_addr tmp;
> + int ret = 0;
> 
> - if (!mac)
> - return;
> + if (!(dev->device_features & (1ULL << VIRTIO_NET_F_MAC)))
> + return -ENOTSUP;
> +
> + if (!dev->ops->set_config)
> + return -ENOTSUP;
> +
> + ret = dev->ops->set_config(dev, dev->mac_addr,
> + offsetof(struct virtio_net_config, mac),
> + RTE_ETHER_ADDR_LEN);
> + if (ret)
> + PMD_DRV_LOG(ERR, "(%s) Failed to set MAC address in device\n",
> dev->path);

No need to add '\n'. And same for below 'PMD_DRV_LOG'

Thanks,
Chenbo

> +
> + return ret;
> +}
> +
> +int
> +virtio_user_dev_get_mac(struct virtio_user_dev *dev)
> +{
> + int ret = 0;
> +
> + if (!(dev->device_features & (1ULL << VIRTIO_NET_F_MAC)))
> + return -ENOTSUP;
> +
> + if (!dev->ops->get_config)
> + return -ENOTSUP;
> +
> + ret = dev->ops->get_config(dev, dev->mac_addr,
> + offsetof(struct virtio_net_config, mac),
> + RTE_ETHER_ADDR_LEN);
> + if (ret)
> + PMD_DRV_LOG(ERR, "(%s) Failed to get MAC address from device\n",
> dev->path);
> +
> + return ret;
> +}
> +
> +static void
> +virtio_user_dev_init_mac(struct virtio_user_dev *dev, const char *mac)
> +{
> + struct rte_ether_addr cmdline_mac;
> + char buf[RTE_ETHER_ADDR_FMT_SIZE];
> + int ret;
> 
> - if (rte_ether_unformat_addr(mac, &tmp) == 0) {
> - memcpy(dev->mac_addr, &tmp, RTE_ETHER_ADDR_LEN);
> + if (mac && rte_ether_unformat_addr(mac, &cmdline_mac) == 0) {
> + /*
> +  * MAC address was passed from command-line, try to store
> +  * it in the device if it supports it. Otherwise try to use
> +  * the device one.
> +  */
> + memcpy(dev->mac_addr, &cmdline_mac, RTE_ETHER_ADDR_LEN);
>   dev->mac_specified = 1;
> +
> + /* Setting MAC may fail, continue to get the device one in this
> case */
> + virtio_user_dev_set_mac(dev);
> + ret = virtio_user_dev_get_mac(dev);
> + if (ret == -ENOTSUP)
> + goto out;
> +
> + if (memcmp(&cmdline_mac, dev->mac_addr, RTE_ETHER_ADDR_LEN))
> + PMD_DRV_LOG(INFO, "(%s) Device MAC update failed", dev-
> >path);
>   } else {
> - /* ignore the wrong mac, use random mac */
> - PMD_DRV_LOG(ERR, "wrong format of mac: %s", mac);
> + ret = virtio_user_dev_get_mac(dev);
> + if (ret) {
> + PMD_DRV_LOG(ERR, "(%s) No valid MAC in devargs or 
> device,
> use random",
> + dev->path);
> + return;
> + }
> +
> + dev->mac_specified = 1;
>   }
> +out:
> + rte_ether_format_addr(buf, RTE_ETHER_ADDR_FMT_SIZE,
> + (struct rte_ether_addr *)dev->mac_addr);
> + PMD_DRV_LOG(INFO, "(%s) MAC %s specified", dev->path, buf);
>  }
> 
>  static int
> @@ -509,8 +572,6 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char
> *path, int queues,
>   dev->unsupported_features = 0;
>   dev->backend_type = backend_type;
> 
> - parse_mac(dev, mac);
> -
>   if (*ifname) {
>   dev->ifname = *ifname;
>   *ifname = NULL;
> @@ -538,6 +599,8 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char
> *path, int queues,
>   return -1;
>   }
> 
> + virtio_user_dev_init_mac(dev, mac);

Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-16 Thread Jerin Jacob
On Wed, Jun 16, 2021 at 5:52 PM Burakov, Anatoly
 wrote:
>
> On 16-Jun-21 10:42 AM, Jerin Jacob wrote:
> > On Tue, Jun 15, 2021 at 12:18 PM Thomas Monjalon  
> > wrote:
> >>
> >> 14/06/2021 17:48, Morten Brørup:
>  From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon
> >>> It would be much simpler to just increase RTE_MAX_ETHPORTS to something 
> >>> big enough to hold a sufficiently large array. And possibly add an 
> >>> rte_max_ethports variable to indicate the number of populated entries in 
> >>> the array, for use when iterating over the array.
> >>>
> >>> Can we come up with another example than RTE_MAX_ETHPORTS where this 
> >>> library provides a better benefit?
> >>
> >> What is big enough?
> >> Is 640KB enough for RAM? ;)
> >
> > If I understand it correctly, Linux process allocates 640KB due to
> > that fact currently
> > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] is global and it
> > is from BSS.
> >
> > If we make this from heap i.e use malloc() to allocate this memory
> > then in my understanding Linux
> > really won't allocate the real page for backend memory until unless,
> > someone write/read to this memory.
> >
> > i.e it will be free virtual memory using Linux memory management help.
> > If so, we can keep large values for RTE_MAX_ETHPORTS
> > without wasting any "real" memory even though the system has a few ports.
> >
> > Thoughts?
> >
>
> mmap works this way with anonymous memory, i'm not so sure about
> malloc()'ed memory.

Looking at online documentation scatters over the internet, sbrk(), is
based on demand paging.
So I am not sure as well. I am also not sure how we can write some
test case to verify it.
Allocating a huge memory through malloc() not failing, not sure it is
due to demand pagging
or Linux over commit feature or combination of both,

if mmap works in this way, we could have EAL abstraction for such
memory alloc like
eal_malloc_demand_page() or so and if Windows also supports it.



> Plus, we can't base these decisions on what Linux
> does because we support other OS's. Do they do this as well?

+ Windows OS maintainers

>
> --
> Thanks,
> Anatoly


Re: [dpdk-dev] [PATCH] ixgbe: Add runtime tx/rx queue setup for X550

2021-06-16 Thread Zhang, Qi Z
Comments inline.

From: Wu, Jianyue (NSB - CN/Hangzhou) 
Sent: Wednesday, June 16, 2021 1:04 PM
To: Zhang, Qi Z 
Cc: dev@dpdk.org
Subject: RE: [dpdk-dev] [PATCH] ixgbe: Add runtime tx/rx queue setup for X550

Hi, Qi,

OK, thanks indeed for the explanation, appreciated😊
BTW, I see patch is Superseded, does that mean it is not needed, or need update?

[Qi:] because I assume you are required to submit a new version base on 
previous comment anyway.

> 1.  why only enable this for x550, is it because you only need this or you 
> already know other ixgbe device has the limitation?
I’m using this x550 device, but not sure if other device would have such 
limitation yet. But I’m sure that X550 can support it, because I had checked 
the doc, and also tested it in the ENV.

[Qi:]  I hope we can enable this for all the ixgbe family if we know there is 
no limitation, I think we need to check with ixgbe maintainer.

  >2 . As the patch expose the new feature of the driver, could you provide 
more test information?
I tested to create new Tx Queue after rte_eth_dev_start() done and use it in my 
code. But haven’t tested in DPDK code yet, like testpmd.
3.  You need to update document in your patch, these include
1) doc/guides/nics/features/ixgbe.ini
2) if only x550 support runtime queue configure, you’d better to explain 
this in doc/guides/nics/ixgbe.rst.
OK, thanks, will check the documents. BTW, I see that at least there is 82599, 
X540, X520 is 10GE, from http://doc.dpdk.org/guides/nics/ixgbe.html, I can 
check their documents, but better to have some further tests, is there any ENV 
I can use to test it? Thanks.

[Qi:], I don’t think we have any public environment for testing, anyways thanks 
for your contribution.

Best Regards,
Dave(Jianyue)

From: Zhang, Qi Z mailto:qi.z.zh...@intel.com>>
Sent: 2021年6月16日 8:31
To: Wu, Jianyue (NSB - CN/Hangzhou) 
mailto:jianyue...@nokia-sbell.com>>
Cc: dev@dpdk.org
Subject: RE: [dpdk-dev] [PATCH] ixgbe: Add runtime tx/rx queue setup for X550

Hi Jianyue:

 I think you can ignore those warning.

 Below are couple questions for this patch:
 1.  why only enable this for x550, is it because you only need this or you 
already know other ixgbe device has the limitation?
2 . As the patch expose the new feature of the driver, could you provide more 
test information?
3.  You need to update document in your patch, these include
1) doc/guides/nics/features/ixgbe.ini
2) if only x550 support runtime queue configure, you’d better to explain 
this in doc/guides/nics/ixgbe.rst.

Thanks
Qi

From: Wu, Jianyue (NSB - CN/Hangzhou) 
mailto:jianyue...@nokia-sbell.com>>
Sent: Wednesday, May 26, 2021 1:30 PM
To: Zhang, Qi Z mailto:qi.z.zh...@intel.com>>
Cc: dev@dpdk.org
Subject: [dpdk-dev] [PATCH] ixgbe: Add runtime tx/rx queue setup for X550


Hello, Zhang, qi,



May I ask a question?

I see this patch is delegate to you.

It seems some cases are failed in this link, but are i40e cases, can we ignore 
them? Because seems not related with this patch. Thanks😊

http://patches.dpdk.org/project/dpdk/patch/20210524115329.40525-1-jianyue...@nokia-sbell.com/

https://lab.dpdk.org/results/dashboard/patchsets/17199/



Thanks,

Best regards,

Dave


Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-16 Thread Bruce Richardson
On Wed, Jun 16, 2021 at 01:27:17PM +0200, Morten Brørup wrote:
> > From: Jerin Jacob [mailto:jerinjac...@gmail.com]
> > Sent: Wednesday, 16 June 2021 11.42
> > 
> > On Tue, Jun 15, 2021 at 12:18 PM Thomas Monjalon 
> > wrote:
> > >
> > > 14/06/2021 17:48, Morten Brørup:
> > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> > Monjalon
> > > > It would be much simpler to just increase RTE_MAX_ETHPORTS to
> > something big enough to hold a sufficiently large array. And possibly
> > add an rte_max_ethports variable to indicate the number of populated
> > entries in the array, for use when iterating over the array.
> > > >
> > > > Can we come up with another example than RTE_MAX_ETHPORTS where
> > this library provides a better benefit?
> > >
> > > What is big enough?
> > > Is 640KB enough for RAM? ;)
> > 
> > If I understand it correctly, Linux process allocates 640KB due to
> > that fact currently
> > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] is global and it
> > is from BSS.
> 
> Correct.
> 
> > If we make this from heap i.e use malloc() to allocate this memory
> > then in my understanding Linux
> > really won't allocate the real page for backend memory until unless,
> > someone write/read to this memory.
> 
> If the array is allocated from the heap, its members will be accessed though 
> a pointer to the array, e.g. in rte_eth_rx/tx_burst(). This might affect 
> performance, which is probably why the array is allocated the way it is.
>

It depends on whether the array contains pointers to malloced elements or
the array itself is just a single malloced array of all the structures.
While I think the parray proposal referred to the former - which would have
an extra level of indirection - the switch we are discussing here is the
latter which should have no performance difference, since the method of
accessing the elements will be the same, only with the base address
pointing to a different area of memory.
 
> Although it might be worth investigating how much it actually affects the 
> performance.
> 
> So we need to do something else if we want to conserve memory and still allow 
> a large rte_eth_devices[] array.
> 
> Looking at struct rte_eth_dev, we could reduce its size as follows:
> 
> 1. Change the two callback arrays 
> post_rx/pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT] to pointers to callback 
> arrays, which are allocated from the heap.
> With the default RTE_MAX_QUEUES_PER_PORT of 1024, these two arrays are the 
> sinners that make the struct rte_eth_dev use so much memory. This 
> modification would save 16 KB (minus 16 bytes for the pointers to the two 
> arrays) per port.
> Furthermore, these callback arrays would only need to be allocated if the 
> application is compiled with callbacks enabled (#define 
> RTE_ETHDEV_RXTX_CALLBACKS). And they would only need to be sized to the 
> actual number of queues for the port.
> 
> The disadvantage is that this would add another level of indirection, 
> although only for applications compiled with callbacks enabled.
> 
This seems reasonable to at least investigate.

> 2. Remove reserved_64s[4] and reserved_ptrs[4]. This would save 64 bytes per 
> port. Not much, but worth considering if we are changing the API/ABI anyway.
> 
I strongly dislike reserved fields to I would tend to favour these.
However, it does possibly reduce future compatibility if we do need to add
something to ethdev.

Another option is to split ethdev into fast-path and non-fastpath parts -
similar to Konstantin's suggestion of just having an array of the ops. We
can have an array of minimal structures with fastpath ops and queue
pointers, for example, with an ethdev-private pointer to the rest of the
struct elsewhere in memory. Since that second struct would be allocated
on-demand, the size of the ethdev array can be scaled with far smaller
footprint.

/Bruce


Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Bruce Richardson
On Wed, Jun 16, 2021 at 06:17:07PM +0800, fengchengwen wrote:
> On 2021/6/16 15:09, Morten Brørup wrote:
> >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
> >> Sent: Tuesday, 15 June 2021 18.39
> >>
> >> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> >>> This patch introduces 'dmadevice' which is a generic type of DMA
> >>> device.
> >>>
> >>> The APIs of dmadev library exposes some generic operations which can
> >>> enable configuration and I/O with the DMA devices.
> >>>
> >>> Signed-off-by: Chengwen Feng 
> >>> ---
> >> Thanks for sending this.
> >>
> >> Of most interest to me right now are the key data-plane APIs. While we
> >> are
> >> still in the prototyping phase, below is a draft of what we are
> >> thinking
> >> for the key enqueue/perform_ops/completed_ops APIs.
> >>
> >> Some key differences I note in below vs your original RFC:
> >> * Use of void pointers rather than iova addresses. While using iova's
> >> makes
> >>   sense in the general case when using hardware, in that it can work
> >> with
> >>   both physical addresses and virtual addresses, if we change the APIs
> >> to use
> >>   void pointers instead it will still work for DPDK in VA mode, while
> >> at the
> >>   same time allow use of software fallbacks in error cases, and also a
> >> stub
> >>   driver than uses memcpy in the background. Finally, using iova's
> >> makes the
> >>   APIs a lot more awkward to use with anything but mbufs or similar
> >> buffers
> >>   where we already have a pre-computed physical address.
> >> * Use of id values rather than user-provided handles. Allowing the
> >> user/app
> >>   to manage the amount of data stored per operation is a better
> >> solution, I
> >>   feel than proscribing a certain about of in-driver tracking. Some
> >> apps may
> >>   not care about anything other than a job being completed, while other
> >> apps
> >>   may have significant metadata to be tracked. Taking the user-context
> >>   handles out of the API also makes the driver code simpler.
> >> * I've kept a single combined API for completions, which differs from
> >> the
> >>   separate error handling completion API you propose. I need to give
> >> the
> >>   two function approach a bit of thought, but likely both could work.
> >> If we
> >>   (likely) never expect failed ops, then the specifics of error
> >> handling
> >>   should not matter that much.
> >>
> >> For the rest, the control / setup APIs are likely to be rather
> >> uncontroversial, I suspect. However, I think that rather than xstats
> >> APIs,
> >> the library should first provide a set of standardized stats like
> >> ethdev
> >> does. If driver-specific stats are needed, we can add xstats later to
> >> the
> >> API.
> >>
> >> Appreciate your further thoughts on this, thanks.
> >>
> >> Regards,
> >> /Bruce
> > 
> > I generally agree with Bruce's points above.
> > 
> > I would like to share a couple of ideas for further discussion:
> > 
> > 1. API for bulk operations.
> > The ability to prepare a vector of DMA operations, and then post it to the 
> > DMA driver.
> 
> We consider bulk operation and final decide not to support:
> 1. The DMA engine don't applicable to small-packet scenarios which have high 
> PPS.
>PS: The vector is suitable for high PPS.
> 2. To support post bulk ops, we need define standard struct like rte_mbuf, and
>application may nned init the struct field and pass them as pointer array,
>this may cost too much CPU.
> 3. The post request was simple than process completed operations, The CPU 
> write
>performance is also good. ---driver could use vectors to accelerate the 
> process
>of completed operations.
> 

+1 to this. We also looked previously at using bulk APIs for dma offload,
but the cost of building up the structs to pass in, only to have those
structs decomposed again inside the function was adding a lot of
unnecessary overhead. By using individual functions per op, all parameters
are passed via registers, and we can write descriptors faster from those
registers than having to do cache reads.

> > 
> > 2. Prepare the API for more complex DMA operations than just copy/fill.
> > E.g. blitter operations like "copy A bytes from the source starting at 
> > address X, to the destination starting at address Y, masked with the bytes 
> > starting at address Z, then skip B bytes at the source and C bytes at the 
> > destination, rewind the mask to the beginning of Z, and repeat D times". 
> > This is just an example.
> > I'm suggesting to use a "DMA operation" union structure as parameter to the 
> > command enqueue function, rather than having individual functions for each 
> > possible DMA operation.
> 
> There are many sisution which may hard to define such structure, I prefer 
> separates API like copy/fill/...
> PS: I saw struct dma_device (Linux dmaengine.h) also support various prep_xxx 
> API.
> 

I think the API set will be defined by what the various hardware drivers

Re: [dpdk-dev] [PATCH v1] net/mlx5: fix IPIP multi tunnel validation

2021-06-16 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: dev  On Behalf Of Lior Margalit
> Sent: Wednesday, June 16, 2021 10:01 AM
> To: dev@dpdk.org; Slava Ovsiienko ; Matan Azrad
> 
> Cc: Ori Kam ; Lior Margalit ;
> sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH v1] net/mlx5: fix IPIP multi tunnel validation
> 
> A flow rule must not include multiple tunnel layers.
> An attempt to create such a rule, for example:
> testpmd> flow create .../ vxlan / eth / ipv4 proto is 4 / end 
> results in an unclear error.
> 
> In the current implementation there is a check for
> multiple IPIP tunnels, but not for combination of IPIP
> and a different kind of tunnel, such as VXLAN. The fix
> is to enhance the above check to use MLX5_FLOW_LAYER_TUNNEL
> that consists of all the tunnel masks. The error message
> will be "multiple tunnel not supported".
> 
> Fixes: 5e33bebdd8d3 ("net/mlx5: support IP-in-IP tunnel")
> Cc: sta...@dpdk.org
> 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh


Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Bruce Richardson
On Wed, Jun 16, 2021 at 02:14:54PM +0200, David Marchand wrote:
> On Tue, Jun 15, 2021 at 3:25 PM Chengwen Feng  wrote:
> > +
> > +#define RTE_DMADEV_NAME_MAX_LEN(64)
> > +/**< @internal Max length of name of DMA PMD */
> > +
> > +/** @internal
> > + * The data structure associated with each DMA device.
> > + */
> > +struct rte_dmadev {
> > +   /**< Device ID for this instance */
> > +   uint16_t dev_id;
> > +   /**< Functions exported by PMD */
> > +   const struct rte_dmadev_ops *dev_ops;
> > +   /**< Device info. supplied during device initialization */
> > +   struct rte_device *device;
> > +   /**< Driver info. supplied by probing */
> > +   const char *driver_name;
> > +
> > +   /**< Device name */
> > +   char name[RTE_DMADEV_NAME_MAX_LEN];
> > +} __rte_cache_aligned;
> > +
> 
> I see no queue/channel notion.
> How does a rte_dmadev object relate to a physical hw engine?
> 
One queue, one device.
When looking to update the ioat driver for 20.11 release when I added the
idxd part, I considered adding a queue parameter to the API to look like
one device with multiple queues. However, since each queue acts completely
independently of each other, there was no benefit to doing so. It's just
easier to have a single id to identify a device queue.


Re: [dpdk-dev] [PATCH v1] net/i40e: remove the SMP barrier in HW scanning func

2021-06-16 Thread Zhang, Qi Z
Hi

> -Original Message-
> From: Honnappa Nagarahalli 
> Sent: Tuesday, June 8, 2021 5:36 AM
> To: Zhang, Qi Z ; Joyce Kong ;
> Xing, Beilei ; Ruifeng Wang 
> Cc: dev@dpdk.org; nd ; Honnappa Nagarahalli
> ; nd 
> Subject: RE: [PATCH v1] net/i40e: remove the SMP barrier in HW scanning
> func
> 
> 
> 
> > >
> > > > >
> > > > > Add the logic to determine how many DD bits have been set for
> > > > > contiguous packets, for removing the SMP barrier while reading descs.
> > > >
> > > > I didn't understand this.
> > > > The current logic already guarantee the read out DD bits are from
> > > > continue packets, as it read Rx descriptor in a reversed order
> > > > from the
> > ring.
> > > Qi, the comments in the code mention that there is a race condition
> > > if the descriptors are not read in the reverse order. But, they do
> > > not mention what the race condition is and how it can occur.
> > > Appreciate if you could explain that.
> >
> > The Race condition happens between the NIC and CPU, if write and read
> > DD bit in the same order, there might be a hole (e.g. 1011)  with the
> > reverse read order, we make sure no more "1" after the first "0"
> > as the read address are declared as volatile, compiler will not
> > re-ordered them.
> My understanding is that
> 
> 1) the NIC will write an entire cache line of descriptors to memory 
> "atomically"
> (i.e. the entire cache line is visible to the CPU at once) if there are enough
> descriptors ready to fill one cache line.
> 2) But, if there are not enough descriptors ready (because for ex: there is 
> not
> enough traffic), then it might write partial cache lines.

Yes, for example a cache line contains 4 x16 bytes descriptors and it is 
possible we get 1 1 1 0 for DD bit at some moment.

> 
> Please correct me if I am wrong.
> 
> For #1, I do not think it matters if we read the descriptors in reverse order 
> or
> not as the cache line is written atomically.

I think below cases may happens if we don't read in reserve order.

1. CPU get first cache line as 1 1 1 0 in a loop
2. new packets coming and NIC append last 1 to the first cache and a new cache 
line with 1 1 1 1.
3. CPU continue new cache line with 1 1 1 1 in the same loop, but the last 1 of 
first cache line is missed, so finally it get 1 1 1 0 1 1 1 1. 


> For #1, if we read in reverse order, does it make sense to not check the DD 
> bits
> of descriptors that are earlier in the order once we encounter a descriptor 
> that
> has its DD bit set? This is because NIC updates the descriptors in order.

I think the answer is yes, when we met the first DD bit, we should able to 
calculated the exact number base on the index, but not sure how much 
performance gain.


> 
> >
> > >
> > > On x86, the reads are not re-ordered (though the compiler can
> > > re-order). On ARM, the reads can get re-ordered and hence the
> > > barriers are required. In order to avoid the barriers, we are trying
> > > to process only those descriptors whose DD bits are set such that
> > > they are contiguous. i.e. if the DD bits are 1011, we process only the 
> > > first
> descriptor.
> >
> > Ok, I see. thanks for the explanation.
> > At this moment, I may prefer not change the behavior of x86, so
> > compile option for arm can be added, in future when we observe no
> > performance impact for x86 as well, we can consider to remove it, what do
> you think?
> I am ok with this approach.
> 
> >
> > >
> > > > So I didn't see the a new logic be added, would you describe more
> > > > clear about the purpose of this patch?
> > > >
> > > > >
> > > > > Signed-off-by: Joyce Kong 
> > > > > Reviewed-by: Ruifeng Wang 
> > > > > ---
> > > > >  drivers/net/i40e/i40e_rxtx.c | 13 -
> > > > >  1 file changed, 8 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/i40e/i40e_rxtx.c
> > > > > b/drivers/net/i40e/i40e_rxtx.c index
> > > > > 6c58decec..410a81f30 100644
> > > > > --- a/drivers/net/i40e/i40e_rxtx.c
> > > > > +++ b/drivers/net/i40e/i40e_rxtx.c
> > > > > @@ -452,7 +452,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue
> > *rxq)
> > > > >   uint16_t pkt_len;
> > > > >   uint64_t qword1;
> > > > >   uint32_t rx_status;
> > > > > - int32_t s[I40E_LOOK_AHEAD], nb_dd;
> > > > > + int32_t s[I40E_LOOK_AHEAD], var, nb_dd;
> > > > >   int32_t i, j, nb_rx = 0;
> > > > >   uint64_t pkt_flags;
> > > > >   uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl; @@ -482,11
> > > > > +482,14 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> > > > >   I40E_RXD_QW1_STATUS_SHIFT;
> > > > >   }
> > > > >
> > > > > - rte_smp_rmb();
> > > >
> > > > Any performance gain by removing this? and it is not necessary to
> > > > be combined with below change, right?
> > > >
> > > > > -
> > > > >   /* Compute how many status bits were set */
> > > > > - for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)
> > > > > -  

Re: [dpdk-dev] [PATCH v1] net/ice/base: fix wrong ptype bitmap for IP fragment

2021-06-16 Thread Zhang, Qi Z



> -Original Message-
> From: Xu, Ting 
> Sent: Thursday, June 10, 2021 10:45 AM
> To: dev@dpdk.org
> Cc: Zhang, Qi Z ; Xu, Ting ;
> sta...@dpdk.org
> Subject: [PATCH v1] net/ice/base: fix wrong ptype bitmap for IP fragment
> 
> IPv4 and IPv6 fragment ptypes are supposed to be separated from IP other
> ptypes. New bitmaps for IP fragment ptypes were created, but the IP fragment
> ptypes were not deleted from the previous non-frag bitmaps, which will cause
> conflicts. This patch removes IP fragment ptypes from the non-frag bitmaps.
> 
> Fixes: 843452817561 ("net/ice/base: support IP fragment RSS and FDIR")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ting Xu 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi



Re: [dpdk-dev] [PATCH v1] net/i40e: remove the SMP barrier in HW scanning func

2021-06-16 Thread Bruce Richardson
On Wed, Jun 16, 2021 at 01:29:24PM +, Zhang, Qi Z wrote:
> Hi
> 
> > -Original Message-
> > From: Honnappa Nagarahalli 
> > Sent: Tuesday, June 8, 2021 5:36 AM
> > To: Zhang, Qi Z ; Joyce Kong ;
> > Xing, Beilei ; Ruifeng Wang 
> > Cc: dev@dpdk.org; nd ; Honnappa Nagarahalli
> > ; nd 
> > Subject: RE: [PATCH v1] net/i40e: remove the SMP barrier in HW scanning
> > func
> > 
> > 
> > 
> > > >
> > > > > >
> > > > > > Add the logic to determine how many DD bits have been set for
> > > > > > contiguous packets, for removing the SMP barrier while reading 
> > > > > > descs.
> > > > >
> > > > > I didn't understand this.
> > > > > The current logic already guarantee the read out DD bits are from
> > > > > continue packets, as it read Rx descriptor in a reversed order
> > > > > from the
> > > ring.
> > > > Qi, the comments in the code mention that there is a race condition
> > > > if the descriptors are not read in the reverse order. But, they do
> > > > not mention what the race condition is and how it can occur.
> > > > Appreciate if you could explain that.
> > >
> > > The Race condition happens between the NIC and CPU, if write and read
> > > DD bit in the same order, there might be a hole (e.g. 1011)  with the
> > > reverse read order, we make sure no more "1" after the first "0"
> > > as the read address are declared as volatile, compiler will not
> > > re-ordered them.
> > My understanding is that
> > 
> > 1) the NIC will write an entire cache line of descriptors to memory 
> > "atomically"
> > (i.e. the entire cache line is visible to the CPU at once) if there are 
> > enough
> > descriptors ready to fill one cache line.
> > 2) But, if there are not enough descriptors ready (because for ex: there is 
> > not
> > enough traffic), then it might write partial cache lines.
> 
> Yes, for example a cache line contains 4 x16 bytes descriptors and it is 
> possible we get 1 1 1 0 for DD bit at some moment.
> 
> > 
> > Please correct me if I am wrong.
> > 
> > For #1, I do not think it matters if we read the descriptors in reverse 
> > order or
> > not as the cache line is written atomically.
> 
> I think below cases may happens if we don't read in reserve order.
> 
> 1. CPU get first cache line as 1 1 1 0 in a loop
> 2. new packets coming and NIC append last 1 to the first cache and a new 
> cache line with 1 1 1 1.
> 3. CPU continue new cache line with 1 1 1 1 in the same loop, but the last 1 
> of first cache line is missed, so finally it get 1 1 1 0 1 1 1 1. 
> 

The one-sentence answer here is: when two entities are moving along a line
in the same direction - like two runners in a race - then they can pass
each other multiple times as each goes slower or faster at any point in
time, whereas if they are moving in opposite directions there will only
ever be one cross-over point no matter how the speed of each changes. 

In the case of NIC and software this fact means that there will always be a
clear cross-over point from DD set to not-set.

> 
> > For #1, if we read in reverse order, does it make sense to not check the DD 
> > bits
> > of descriptors that are earlier in the order once we encounter a descriptor 
> > that
> > has its DD bit set? This is because NIC updates the descriptors in order.
> 
> I think the answer is yes, when we met the first DD bit, we should able to 
> calculated the exact number base on the index, but not sure how much 
> performance gain.
> 
The other factors here are:
1. The driver does not do a straight read of all 32 DD bits in one go,
rather it does 8 at a time and aborts at the end of a set of 8 if not all
are valid.
2. For any that are set, we have to read the descriptor anyway to get the
packet data out of it, so in the shortcut case of the last descriptor being
set, we still have to read the other 7 anyway, and DD comes for free as
part of it.
3. Blindly reading 8 at a time reduces the branching to just a single
decision point at the end of each set of 8, reducing possible branch
mispredicts.


Re: [dpdk-dev] [PATCH v4] vhost: check header for legacy dequeue offload

2021-06-16 Thread Wang, Xiao W
Hi David,

Thanks for your comments.
I agree with your suggestions. BTW, I notice some other invalid corner cases 
which need rolling back mbuf->l2_len, l3_len and ol_flag.
E.g. the default case in the "switch {}" context is not valid.
BTW, l4_proto variable is better to be a uint8_t, rather than uint16_t.

I will prepare a new version.

BRs,
Xiao

> -Original Message-
> From: David Marchand 
> Sent: Tuesday, June 15, 2021 3:57 PM
> To: Wang, Xiao W 
> Cc: Maxime Coquelin ; Xia, Chenbo
> ; Jiang, Cheng1 ; dev
> ; dpdk stable 
> Subject: Re: [dpdk-dev] [PATCH v4] vhost: check header for legacy dequeue
> offload
> 
> On Tue, Jun 15, 2021 at 9:06 AM Xiao Wang 
> wrote:
> > diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> > index 8da8a86a10..351ff0a841 100644
> > --- a/lib/vhost/virtio_net.c
> > +++ b/lib/vhost/virtio_net.c
> > @@ -2259,44 +2259,64 @@ virtio_net_with_host_offload(struct
> virtio_net *dev)
> > return false;
> >  }
> >
> > -static void
> > -parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
> > +static int
> > +parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr,
> > +   uint16_t *len)
> >  {
> 
> 
> This function name is misleading, name could be parse_headers().
> Its semantic gets more and more confusing with those l4_hdr and len
> pointers.
> 
> This function fills ->lX_len in the mbuf, everything is available for caller.
> 
> Caller can check that rte_pktmbuf_data_len() is >= m->l2_len +
> m->l3_len + somesize.
> => no need for len.
> 
> l4_hdr can simply be deduced with rte_pktmbuf_mtod_offset(m, struct
> somestruct *, m->l2_len + m->l3_len).
> => no need for l4_hdr.
> 
> 
> > struct rte_ipv4_hdr *ipv4_hdr;
> > struct rte_ipv6_hdr *ipv6_hdr;
> > void *l3_hdr = NULL;
> 
> No need for l3_hdr.
> 
> 
> > struct rte_ether_hdr *eth_hdr;
> > uint16_t ethertype;
> > +   uint16_t data_len = m->data_len;
> 
> Avoid direct access to mbuf internals, we have inline helpers:
> rte_pktmbuf_data_len(m).
> 
> 
> > +
> > +   if (data_len <= sizeof(struct rte_ether_hdr))
> 
> Strictly speaking, < is enough.
> 
> 
> > +   return -EINVAL;
> >
> > eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
> >
> > m->l2_len = sizeof(struct rte_ether_hdr);
> > ethertype = rte_be_to_cpu_16(eth_hdr->ether_type);
> > +   data_len -= sizeof(struct rte_ether_hdr);
> 
> No need to decrement data_len if checks below are all done for absolute
> value.
> See suggestions below.
> 
> 
> >
> > if (ethertype == RTE_ETHER_TYPE_VLAN) {
> > +   if (data_len <= sizeof(struct rte_vlan_hdr))
> > +   return -EINVAL;
> 
> if (data_len < sizeof(rte_ether_hdr) + sizeof(struct rte_vlan_hdr))
> 
> 
> > +
> > struct rte_vlan_hdr *vlan_hdr =
> > (struct rte_vlan_hdr *)(eth_hdr + 1);
> >
> > m->l2_len += sizeof(struct rte_vlan_hdr);
> > ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto);
> > +   data_len -= sizeof(struct rte_vlan_hdr);
> 
> Idem.
> 
> 
> > }
> >
> > l3_hdr = (char *)eth_hdr + m->l2_len;
> >
> > switch (ethertype) {
> > case RTE_ETHER_TYPE_IPV4:
> > +   if (data_len <= sizeof(struct rte_ipv4_hdr))
> > +   return -EINVAL;
> 
> if (data_len < m->l2_len + sizeof(struct rte_ipv4_hdr))
> 
> 
> > ipv4_hdr = l3_hdr;
> 
> ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct rte_ipv4_hdr *, m->l2_len);
> 
> 
> > *l4_proto = ipv4_hdr->next_proto_id;
> > m->l3_len = rte_ipv4_hdr_len(ipv4_hdr);
> > +   if (data_len <= m->l3_len) {
> 
> if (data_len < m->l2_len + m->l3_len)
> 
> 
> > +   m->l3_len = 0;
> > +   return -EINVAL;
> 
> Returning here leaves m->l2_len set.
> 
> 
> > +   }
> > *l4_hdr = (char *)l3_hdr + m->l3_len;
> > m->ol_flags |= PKT_TX_IPV4;
> > +   data_len -= m->l3_len;
> > break;
> > case RTE_ETHER_TYPE_IPV6:
> > +   if (data_len <= sizeof(struct rte_ipv6_hdr))
> > +   return -EINVAL;
> 
> if (data_len < m->l2_len + sizeof(struct rte_ipv6_hdr))
> Returning here leaves m->l2_len set.
> 
> 
> > ipv6_hdr = l3_hdr;
> 
> ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct rte_ipv6_hdr *, m->l2_len);
> 
> 
> > *l4_proto = ipv6_hdr->proto;
> > m->l3_len = sizeof(struct rte_ipv6_hdr);
> > *l4_hdr = (char *)l3_hdr + m->l3_len;
> > m->ol_flags |= PKT_TX_IPV6;
> > +   data_len -= m->l3_len;
> > break;
> > default:
> > m->l3_len = 0;
> > @@ -2304,6 +2324,9 @@ parse_ethernet(struct rte_mbuf *m, uint16_t
> *l4_proto, void **l4_hdr)
> > *l4_hdr = NULL;
> 

Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Jerin Jacob
On Wed, Jun 16, 2021 at 3:47 PM fengchengwen  wrote:
>
> On 2021/6/16 15:09, Morten Brørup wrote:
> >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
> >> Sent: Tuesday, 15 June 2021 18.39
> >>
> >> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> >>> This patch introduces 'dmadevice' which is a generic type of DMA
> >>> device.
> >>>
> >>> The APIs of dmadev library exposes some generic operations which can
> >>> enable configuration and I/O with the DMA devices.
> >>>
> >>> Signed-off-by: Chengwen Feng 
> >>> ---
> >> Thanks for sending this.
> >>
> >> Of most interest to me right now are the key data-plane APIs. While we
> >> are
> >> still in the prototyping phase, below is a draft of what we are
> >> thinking
> >> for the key enqueue/perform_ops/completed_ops APIs.
> >>
> >> Some key differences I note in below vs your original RFC:
> >> * Use of void pointers rather than iova addresses. While using iova's
> >> makes
> >>   sense in the general case when using hardware, in that it can work
> >> with
> >>   both physical addresses and virtual addresses, if we change the APIs
> >> to use
> >>   void pointers instead it will still work for DPDK in VA mode, while
> >> at the
> >>   same time allow use of software fallbacks in error cases, and also a
> >> stub
> >>   driver than uses memcpy in the background. Finally, using iova's
> >> makes the
> >>   APIs a lot more awkward to use with anything but mbufs or similar
> >> buffers
> >>   where we already have a pre-computed physical address.
> >> * Use of id values rather than user-provided handles. Allowing the
> >> user/app
> >>   to manage the amount of data stored per operation is a better
> >> solution, I
> >>   feel than proscribing a certain about of in-driver tracking. Some
> >> apps may
> >>   not care about anything other than a job being completed, while other
> >> apps
> >>   may have significant metadata to be tracked. Taking the user-context
> >>   handles out of the API also makes the driver code simpler.
> >> * I've kept a single combined API for completions, which differs from
> >> the
> >>   separate error handling completion API you propose. I need to give
> >> the
> >>   two function approach a bit of thought, but likely both could work.
> >> If we
> >>   (likely) never expect failed ops, then the specifics of error
> >> handling
> >>   should not matter that much.
> >>
> >> For the rest, the control / setup APIs are likely to be rather
> >> uncontroversial, I suspect. However, I think that rather than xstats
> >> APIs,
> >> the library should first provide a set of standardized stats like
> >> ethdev
> >> does. If driver-specific stats are needed, we can add xstats later to
> >> the
> >> API.
> >>
> >> Appreciate your further thoughts on this, thanks.
> >>
> >> Regards,
> >> /Bruce
> >
> > I generally agree with Bruce's points above.
> >
> > I would like to share a couple of ideas for further discussion:


I believe some of the other requirements and comments for generic DMA will be

1) Support for the _channel_, Each channel may have different
capabilities and functionalities.
Typical cases are, each channel have separate source and destination
devices like
DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe
EP to PCIe EP.
So we need some notion of the channel in the specification.

2) I assume current data plane APIs are not thread-safe. Is it right?


3) Cookie scheme outlined earlier looks good to me. Instead of having
generic dequeue() API

4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * src,
void * dst, unsigned int length);
to two stage API like, Where one will be used in fastpath and other
one will use used in slowpath.

- slowpath API will for take channel and take other attributes for transfer

Example syantx will be:

struct rte_dmadev_desc {
   channel id;
   ops ; // copy, xor, fill etc
  other arguments specific to dma transfer // it can be set
based on capability.

};

rte_dmadev_desc_t rte_dmadev_preprare(uint16_t dev_id,  struct
rte_dmadev_desc *dec);

- Fastpath takes arguments that need to change per transfer along with
slow-path handle.

rte_dmadev_enqueue(uint16_t dev_id, void * src, void * dst, unsigned
int length,  rte_dmadev_desc_t desc)

This will help to driver to
-Former API form the device-specific descriptors in slow path  for a
given channel and fixed attributes per transfer
-Later API blend "variable" arguments such as src, dest address with
slow-path created descriptors

The above will give better performance and is the best trade-off
between performance and per transfer variables.


[dpdk-dev] [PATCH] net/mlx5: fix modify field action order for MAC

2021-06-16 Thread Alexander Kozyrev
MAC addresses are split into 2 parts inside Mellanox NIC:
bits 0-15 are separate from bits 16-47. That makes a copy
from another packet field tricky because any other field
is aligned to 32 bits, not 16. This causes unexpected
results when using the MODIFY_FIELD action with MAC addresses.
Track crossing MAC addresses boundary and arrange a proper
order for the MODIFY_FIELD action involving MAC addresses.

Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
Cc: sta...@dpdk.org

Signed-off-by: Alexander Kozyrev 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_flow_dv.c | 109 
 1 file changed, 70 insertions(+), 39 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index dafd37ab93..ba341197e6 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -426,6 +426,8 @@ flow_dv_convert_modify_action(struct rte_flow_item *item,
unsigned int off_b;
uint32_t mask;
uint32_t data;
+   bool next_field = true;
+   bool next_dcopy = true;
 
if (i >= MLX5_MAX_MODIFY_NUM)
return rte_flow_error_set(error, EINVAL,
@@ -443,15 +445,13 @@ flow_dv_convert_modify_action(struct rte_flow_item *item,
size_b = sizeof(uint32_t) * CHAR_BIT -
 off_b - __builtin_clz(mask);
MLX5_ASSERT(size_b);
-   size_b = size_b == sizeof(uint32_t) * CHAR_BIT ? 0 : size_b;
actions[i] = (struct mlx5_modification_cmd) {
.action_type = type,
.field = field->id,
.offset = off_b,
-   .length = size_b,
+   .length = (size_b == sizeof(uint32_t) * CHAR_BIT) ?
+   0 : size_b,
};
-   /* Convert entire record to expected big-endian format. */
-   actions[i].data0 = rte_cpu_to_be_32(actions[i].data0);
if (type == MLX5_MODIFICATION_TYPE_COPY) {
MLX5_ASSERT(dcopy);
actions[i].dst_field = dcopy->id;
@@ -459,7 +459,27 @@ flow_dv_convert_modify_action(struct rte_flow_item *item,
(int)dcopy->offset < 0 ? off_b : dcopy->offset;
/* Convert entire record to big-endian format. */
actions[i].data1 = rte_cpu_to_be_32(actions[i].data1);
-   ++dcopy;
+   /*
+* Destination field overflow. Copy leftovers of
+* a source field to the next destination field.
+*/
+   if ((size_b > dcopy->size * CHAR_BIT) && dcopy->size) {
+   actions[i].length = dcopy->size * CHAR_BIT;
+   field->offset += dcopy->size;
+   next_field = false;
+   }
+   /*
+* Not enough bits in a source filed to fill a
+* destination field. Switch to the next source.
+*/
+   if (dcopy->size > field->size &&
+   (size_b == field->size * CHAR_BIT)) {
+   actions[i].length = field->size * CHAR_BIT;
+   dcopy->offset += field->size * CHAR_BIT;
+   next_dcopy = false;
+   }
+   if (next_dcopy)
+   ++dcopy;
} else {
MLX5_ASSERT(item->spec);
data = flow_dv_fetch_field((const uint8_t *)item->spec +
@@ -468,8 +488,11 @@ flow_dv_convert_modify_action(struct rte_flow_item *item,
data = (data & mask) >> off_b;
actions[i].data1 = rte_cpu_to_be_32(data);
}
+   /* Convert entire record to expected big-endian format. */
+   actions[i].data0 = rte_cpu_to_be_32(actions[i].data0);
+   if (next_field)
+   ++field;
++i;
-   ++field;
} while (field->size);
if (resource->actions_num == i)
return rte_flow_error_set(error, EINVAL,
@@ -1433,6 +1456,7 @@ mlx5_flow_field_id_to_modify_info
struct mlx5_priv *priv = dev->data->dev_private;
struct mlx5_dev_config *config = &priv->config;
uint32_t idx = 0;
+   uint32_t off = 0;
uint64_t val = 0;
switch (data->field) {
case RTE_FLOW_FIELD_START:
@@ -1440,61 +1464,63 @@ mlx5_flow_field_id_to_modify_info
MLX5_ASSERT(false);
break;
case RTE_FLOW_FIELD_MAC_DST:
+   off = data->offset > 16 ? data->offset - 16 : 0;
  

[dpdk-dev] [PATCH] net/mlx5: convert meta register to big-endian

2021-06-16 Thread Alexander Kozyrev
Metadata is stored in the CPU order (little-endian format on x86),
while all the packet header fields are stored in the network order.
That leads to the wrong results whenever we try to use the metadata
value in the modify_field actions: bytes are swapped as a result.

Convert the metadata into the big-endian format before storing it
in the Mellanox NIC to achieve consistent behaviour.

Signed-off-by: Alexander Kozyrev 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_flow_dv.c  | 40 +---
 drivers/net/mlx5/mlx5_rx.c   |  5 +--
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 22 +
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h| 30 +++---
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 18 ---
 5 files changed, 59 insertions(+), 56 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index dafd37ab93..b36ffde559 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1239,8 +1239,8 @@ flow_dv_convert_action_set_meta
 const struct rte_flow_action_set_meta *conf,
 struct rte_flow_error *error)
 {
-   uint32_t data = conf->data;
-   uint32_t mask = conf->mask;
+   uint32_t mask = rte_cpu_to_be_32(conf->mask);
+   uint32_t data = rte_cpu_to_be_32(conf->data) & mask;
struct rte_flow_item item = {
.spec = &data,
.mask = &mask,
@@ -1253,25 +1253,14 @@ flow_dv_convert_action_set_meta
if (reg < 0)
return reg;
MLX5_ASSERT(reg != REG_NON);
-   /*
-* In datapath code there is no endianness
-* coversions for perfromance reasons, all
-* pattern conversions are done in rte_flow.
-*/
if (reg == REG_C_0) {
struct mlx5_priv *priv = dev->data->dev_private;
uint32_t msk_c0 = priv->sh->dv_regc0_mask;
-   uint32_t shl_c0;
+   uint32_t shl_c0 = rte_bsf32(msk_c0);
 
-   MLX5_ASSERT(msk_c0);
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-   shl_c0 = rte_bsf32(msk_c0);
-#else
-   shl_c0 = sizeof(msk_c0) * CHAR_BIT - rte_fls_u32(msk_c0);
-#endif
-   mask <<= shl_c0;
-   data <<= shl_c0;
-   MLX5_ASSERT(!(~msk_c0 & rte_cpu_to_be_32(mask)));
+   data = rte_cpu_to_be_32(rte_cpu_to_be_32(data) << shl_c0);
+   mask = rte_cpu_to_be_32(mask) & msk_c0;
+   mask = rte_cpu_to_be_32(mask << shl_c0);
}
reg_c_x[0] = (struct field_modify_info){4, 0, reg_to_field[reg]};
/* The routine expects parameters in memory as big-endian ones. */
@@ -9226,27 +9215,14 @@ flow_dv_translate_item_meta(struct rte_eth_dev *dev,
if (reg < 0)
return;
MLX5_ASSERT(reg != REG_NON);
-   /*
-* In datapath code there is no endianness
-* coversions for perfromance reasons, all
-* pattern conversions are done in rte_flow.
-*/
-   value = rte_cpu_to_be_32(value);
-   mask = rte_cpu_to_be_32(mask);
if (reg == REG_C_0) {
struct mlx5_priv *priv = dev->data->dev_private;
uint32_t msk_c0 = priv->sh->dv_regc0_mask;
uint32_t shl_c0 = rte_bsf32(msk_c0);
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-   uint32_t shr_c0 = __builtin_clz(priv->sh->dv_meta_mask);
 
-   value >>= shr_c0;
-   mask >>= shr_c0;
-#endif
-   value <<= shl_c0;
+   mask &= msk_c0;
mask <<= shl_c0;
-   MLX5_ASSERT(msk_c0);
-   MLX5_ASSERT(!(~msk_c0 & mask));
+   value <<= shl_c0;
}
flow_dv_match_meta_reg(matcher, key, reg, value, mask);
}
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 6cd71a44eb..777a1d6e45 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -740,8 +740,9 @@ rxq_cq_to_mbuf(struct mlx5_rxq_data *rxq, struct rte_mbuf 
*pkt,
}
}
if (rxq->dynf_meta) {
-   uint32_t meta = cqe->flow_table_metadata &
-   rxq->flow_meta_port_mask;
+   uint32_t meta = rte_be_to_cpu_32(cqe->flow_table_metadata >>
+   __builtin_popcount(rxq->flow_meta_port_mask)) &
+   rxq->flow_meta_port_mask;
 
if (meta) {
pkt->ol_flags |= rxq->flow_meta_mask;
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h 
b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
index 2d1154b624..648c59e2c2 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
@

Re: [dpdk-dev] [PATCH v2 4/6] vhost: improve NUMA reallocation

2021-06-16 Thread Maxime Coquelin



On 6/15/21 10:42 AM, Maxime Coquelin wrote:
> This patch improves the numa_realloc() function by making use
> of rte_realloc_socket(), which takes care of the memory copy
> and freeing of the old data.
> 
> Suggested-by: David Marchand 
> Signed-off-by: Maxime Coquelin 
> ---
>  lib/vhost/vhost_user.c | 177 +
>  1 file changed, 73 insertions(+), 104 deletions(-)
> 
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 0e9e26ebe0..b298312db6 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -480,139 +480,108 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
>  static struct virtio_net*
>  numa_realloc(struct virtio_net *dev, int index)
>  {
> - int oldnode, newnode;
> + int node;
>   struct virtio_net *old_dev;
> - struct vhost_virtqueue *old_vq, *vq;
> - struct vring_used_elem *new_shadow_used_split;
> - struct vring_used_elem_packed *new_shadow_used_packed;
> - struct batch_copy_elem *new_batch_copy_elems;
> + struct vhost_virtqueue *vq;
> + struct batch_copy_elem *bce;
> + struct guest_page *gp;
> + struct rte_vhost_memory *mem;
> + size_t mem_size;
>   int ret;
>  
>   if (dev->flags & VIRTIO_DEV_RUNNING)
>   return dev;
>  
>   old_dev = dev;
> - vq = old_vq = dev->virtqueue[index];
> -
> - ret = get_mempolicy(&newnode, NULL, 0, old_vq->desc,
> - MPOL_F_NODE | MPOL_F_ADDR);
> + vq = dev->virtqueue[index];
>  
> - /* check if we need to reallocate vq */
> - ret |= get_mempolicy(&oldnode, NULL, 0, old_vq,
> -  MPOL_F_NODE | MPOL_F_ADDR);
> + ret = get_mempolicy(&node, NULL, 0, vq->desc, MPOL_F_NODE | 
> MPOL_F_ADDR);
>   if (ret) {
> - VHOST_LOG_CONFIG(ERR,
> - "Unable to get vq numa information.\n");
> + VHOST_LOG_CONFIG(ERR, "Unable to get virtqueue %d numa 
> information.\n", index);
>   return dev;
>   }
> - if (oldnode != newnode) {
> - VHOST_LOG_CONFIG(INFO,
> - "reallocate vq from %d to %d node\n", oldnode, newnode);
> - vq = rte_malloc_socket(NULL, sizeof(*vq), 0, newnode);
> - if (!vq)
> - return dev;
>  
> - memcpy(vq, old_vq, sizeof(*vq));
> + vq = rte_realloc_socket(vq, sizeof(*vq), 0, node);
> + if (!vq) {
> + VHOST_LOG_CONFIG(ERR, "Failed to realloc virtqueue %d on node 
> %d\n",
> + index, node);
> + return dev;
> + }
>  
> - if (vq_is_packed(dev)) {
> - new_shadow_used_packed = rte_malloc_socket(NULL,
> - vq->size *
> - sizeof(struct vring_used_elem_packed),
> - RTE_CACHE_LINE_SIZE,
> - newnode);
> - if (new_shadow_used_packed) {
> - rte_free(vq->shadow_used_packed);
> - vq->shadow_used_packed = new_shadow_used_packed;
> - }
> - } else {
> - new_shadow_used_split = rte_malloc_socket(NULL,
> - vq->size *
> - sizeof(struct vring_used_elem),
> - RTE_CACHE_LINE_SIZE,
> - newnode);
> - if (new_shadow_used_split) {
> - rte_free(vq->shadow_used_split);
> - vq->shadow_used_split = new_shadow_used_split;
> - }
> - }
> + if (vq != dev->virtqueue[index]) {
> + VHOST_LOG_CONFIG(INFO, "reallocated virtqueue on node %d\n", 
> node);
> + dev->virtqueue[index] = vq;
> + vhost_user_iotlb_init(dev, index);
> + }
>  
> - new_batch_copy_elems = rte_malloc_socket(NULL,
> - vq->size * sizeof(struct batch_copy_elem),
> - RTE_CACHE_LINE_SIZE,
> - newnode);
> - if (new_batch_copy_elems) {
> - rte_free(vq->batch_copy_elems);
> - vq->batch_copy_elems = new_batch_copy_elems;
> + if (vq_is_packed(dev)) {
> + struct vring_used_elem_packed *sup;
> +
> + sup = rte_realloc_socket(vq->shadow_used_packed, vq->size * 
> sizeof(*sup),
> + RTE_CACHE_LINE_SIZE, node);
> + if (!sup) {
> + VHOST_LOG_CONFIG(ERR, "Failed to realloc shadow packed 
> on node %d\n", node);
> + return dev;
>   }
> + vq->shadow_used_packed = sup;
>  
> - if (vq->log_cache) {
> - struct log_cache_entry *log_cache;
> + } else {
> + 

Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-16 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Wednesday, 16 June 2021 15.03
> 
> On Wed, Jun 16, 2021 at 01:27:17PM +0200, Morten Brørup wrote:
> > > From: Jerin Jacob [mailto:jerinjac...@gmail.com]
> > > Sent: Wednesday, 16 June 2021 11.42
> > >
> > > On Tue, Jun 15, 2021 at 12:18 PM Thomas Monjalon
> 
> > > wrote:
> > > >
> > > > 14/06/2021 17:48, Morten Brørup:
> > > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> > > Monjalon
> > > > > It would be much simpler to just increase RTE_MAX_ETHPORTS to
> > > something big enough to hold a sufficiently large array. And
> possibly
> > > add an rte_max_ethports variable to indicate the number of
> populated
> > > entries in the array, for use when iterating over the array.
> > > > >
> > > > > Can we come up with another example than RTE_MAX_ETHPORTS where
> > > this library provides a better benefit?
> > > >
> > > > What is big enough?
> > > > Is 640KB enough for RAM? ;)
> > >
> > > If I understand it correctly, Linux process allocates 640KB due to
> > > that fact currently
> > > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] is global and
> it
> > > is from BSS.
> >
> > Correct.
> >
> > > If we make this from heap i.e use malloc() to allocate this memory
> > > then in my understanding Linux
> > > really won't allocate the real page for backend memory until
> unless,
> > > someone write/read to this memory.
> >
> > If the array is allocated from the heap, its members will be accessed
> though a pointer to the array, e.g. in rte_eth_rx/tx_burst(). This
> might affect performance, which is probably why the array is allocated
> the way it is.
> >
> 
> It depends on whether the array contains pointers to malloced elements
> or
> the array itself is just a single malloced array of all the structures.
> While I think the parray proposal referred to the former - which would
> have
> an extra level of indirection - the switch we are discussing here is
> the
> latter which should have no performance difference, since the method of
> accessing the elements will be the same, only with the base address
> pointing to a different area of memory.

I was not talking about an array of pointers. And it is not the same:

int arr[27];
int * parr = arr;

// direct access
int dir(int i) { return arr[i]; }

// indirect access
int indir(int i) { return parr[i]; }

The direct access knows the address of arr, so it will compile to:
movsx   rdi, edi
mov eax, DWORD PTR arr[0+rdi*4]
ret

The indirect access needs to first read the memory location holding the pointer 
to the array, and then it can read the array member, so it will compile to:
mov rax, QWORD PTR parr[rip]
movsx   rdi, edi
mov eax, DWORD PTR [rax+rdi*4]
ret

> 
> > Although it might be worth investigating how much it actually affects
> the performance.
> >
> > So we need to do something else if we want to conserve memory and
> still allow a large rte_eth_devices[] array.
> >
> > Looking at struct rte_eth_dev, we could reduce its size as follows:
> >
> > 1. Change the two callback arrays
> post_rx/pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT] to pointers to
> callback arrays, which are allocated from the heap.
> > With the default RTE_MAX_QUEUES_PER_PORT of 1024, these two arrays
> are the sinners that make the struct rte_eth_dev use so much memory.
> This modification would save 16 KB (minus 16 bytes for the pointers to
> the two arrays) per port.
> > Furthermore, these callback arrays would only need to be allocated if
> the application is compiled with callbacks enabled (#define
> RTE_ETHDEV_RXTX_CALLBACKS). And they would only need to be sized to the
> actual number of queues for the port.
> >
> > The disadvantage is that this would add another level of indirection,
> although only for applications compiled with callbacks enabled.
> >
> This seems reasonable to at least investigate.
> 
> > 2. Remove reserved_64s[4] and reserved_ptrs[4]. This would save 64
> bytes per port. Not much, but worth considering if we are changing the
> API/ABI anyway.
> >
> I strongly dislike reserved fields to I would tend to favour these.
> However, it does possibly reduce future compatibility if we do need to
> add
> something to ethdev.

There should be an official policy about adding reserved fields for future 
compatibility. I'm against adding them, unless it can be argued that they are 
likely to match what is needed in the future; in the real world there is no way 
to know if they match future requirements.

> 
> Another option is to split ethdev into fast-path and non-fastpath parts
> -
> similar to Konstantin's suggestion of just having an array of the ops.
> We
> can have an array of minimal structures with fastpath ops and queue
> pointers, for example, with an ethdev-private pointer to the rest of
> the
> struct elsewhere in memory. Since that second struct would be allocated
> on-demand, the size of the ethdev 

Re: [dpdk-dev] [PATCH v3] ethdev: add IPv4 and L4 checksum RSS offload types

2021-06-16 Thread Zhang, Qi Z


> -Original Message-
> From: Jerin Jacob 
> Sent: Tuesday, June 15, 2021 4:26 PM
> To: Zhang, AlvinX 
> Cc: Zhang, Qi Z ; Andrew Rybchenko
> ; Ajit Khaparde
> ; dpdk-dev 
> Subject: Re: [dpdk-dev] [PATCH v3] ethdev: add IPv4 and L4 checksum RSS
> offload types
> 
> On Tue, Jun 15, 2021 at 1:50 PM Alvin Zhang 
> wrote:
> >
> > This patch defines new RSS offload types for IPv4 and L4 checksum,
> > which are required when users want to distribute packets based on the
> > IPv4 or L4 checksum field.
> 
> What is the usecase for distribution based on L4/IPv4 checksum?
> Is it something like HW has the feature so expose it or there is some real use
> case for this application?

This is for real use case, some research by using TCP checksum for FDIR on 
ixgbe.
https://hsadok.com/papers/sprayer-hotnets18.pdf

and we are looking for similar solution in ice, and checksum RSS is the feature 
we need to have.

> 
> 
> 
> > For example "flow create 0 ingress pattern eth / ipv4 / end actions
> > rss types ipv4-chksum end queues end / end", this flow causes all
> > matching packets to be distributed to queues on basis of IPv4
> > checksum.
> >
> > Signed-off-by: Alvin Zhang 
> > Reviewed-by: Andrew Rybchenko 
> > Acked-by: Ajit Khaparde 
> > ---
> >
> > v3: Add L4 checksum RSS offload type
> > ---
> >  app/test-pmd/cmdline.c  | 4 
> >  app/test-pmd/config.c   | 2 ++
> >  lib/ethdev/rte_ethdev.h | 2 ++
> >  3 files changed, 8 insertions(+)
> >
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > 0268b18..6148d84 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -2254,6 +2254,10 @@ struct cmd_config_rss {
> > rss_conf.rss_hf = ETH_RSS_ECPRI;
> > else if (!strcmp(res->value, "mpls"))
> > rss_conf.rss_hf = ETH_RSS_MPLS;
> > +   else if (!strcmp(res->value, "ipv4-chksum"))
> > +   rss_conf.rss_hf = ETH_RSS_IPV4_CHKSUM;
> > +   else if (!strcmp(res->value, "l4-chksum"))
> > +   rss_conf.rss_hf = ETH_RSS_L4_CHKSUM;
> > else if (!strcmp(res->value, "none"))
> > rss_conf.rss_hf = 0;
> > else if (!strcmp(res->value, "level-default")) { diff --git
> > a/app/test-pmd/config.c b/app/test-pmd/config.c index 43c79b5..14968bf
> > 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -140,6 +140,8 @@
> > { "gtpu", ETH_RSS_GTPU },
> > { "ecpri", ETH_RSS_ECPRI },
> > { "mpls", ETH_RSS_MPLS },
> > +   { "ipv4-chksum", ETH_RSS_IPV4_CHKSUM },
> > +   { "l4-chksum", ETH_RSS_L4_CHKSUM },
> > { NULL, 0 },
> >  };
> >
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > faf3bd9..1268729 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -537,6 +537,8 @@ struct rte_eth_rss_conf {
> >  #define ETH_RSS_PPPOE (1ULL << 31)
> >  #define ETH_RSS_ECPRI (1ULL << 32)
> >  #define ETH_RSS_MPLS  (1ULL << 33)
> > +#define ETH_RSS_IPV4_CHKSUM   (1ULL << 34)
> > +#define ETH_RSS_L4_CHKSUM (1ULL << 35)
> >
> >  /*
> >   * We use the following macros to combine with above ETH_RSS_* for
> > --
> > 1.8.3.1
> >


[dpdk-dev] [PATCH] lib/rte_rib6: fix stack buffer overflow

2021-06-16 Thread ohilyard
From: Owen Hilyard 

ASAN found a stack buffer overflow in lib/rib/rte_rib6.c:get_dir.
The fix for the stack buffer overflow was to make sure depth
was always < 128, since when depth = 128 it caused the index
into the ip address to be 16, which read off the end of the array.

While trying to solve the buffer overflow, I noticed that a few
changes could be made to remove the for loop entirely.

Signed-off-by: Owen Hilyard 
---
 lib/rib/rte_rib6.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/lib/rib/rte_rib6.c b/lib/rib/rte_rib6.c
index f6c55ee45..2de50449d 100644
--- a/lib/rib/rte_rib6.c
+++ b/lib/rib/rte_rib6.c
@@ -79,14 +79,20 @@ is_covered(const uint8_t ip1[RTE_RIB6_IPV6_ADDR_SIZE],
 static inline int
 get_dir(const uint8_t ip[RTE_RIB6_IPV6_ADDR_SIZE], uint8_t depth)
 {
-   int i = 0;
-   uint8_t p_depth, msk;
-
-   for (p_depth = depth; p_depth >= 8; p_depth -= 8)
-   i++;
-
-   msk = 1 << (7 - p_depth);
-   return (ip[i] & msk) != 0;
+   int index, msk;
+   /* depth & 127 clamps depth to values that will not
+* read off the end of ip.
+* depth is the number of bits deep into ip to traverse, and
+* is incremented in blocks of 8 (1 byte). This means the last
+* 3 bits are irrelevant to what the index of ip should be.
+*/
+   index = (depth & 127) >> 3;
+   /*
+* msk is the bitmask used to extract the bit used to decide the
+* direction of the next step of the binary search.
+*/
+   msk = 1 << (7 - (depth & 7));
+   return (ip[index] & msk) != 0;
 }
 
 static inline struct rte_rib6_node *
-- 
2.30.2



[dpdk-dev] [PATCH] tests/test_eal_flags: fix memory leak

2021-06-16 Thread ohilyard
From: Owen Hilyard 

The directory steam was not closed when the hugepage action was
HUGEPAGE_CHECK_EXISTS. This caused a memory leak in some parts of
the unit tests.

Signed-off-by: Owen Hilyard 
---
 app/test/test_eal_flags.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 932fbe3d0..0c1e0fb21 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -125,6 +125,7 @@ process_hugefiles(const char * prefix, enum hugepage_action 
action)
{
/* file exists, return */
result = 1;
+   closedir(hugepage_dir);
goto end;
}
break;
-- 
2.30.2



[dpdk-dev] [PATCH] tests/cmdline: fix memory leaks

2021-06-16 Thread ohilyard
From: Owen Hilyard 

Fixes for a few memory leaks in the cmdline_autotest unit test.

All of the leaks were related to not freeing the commandline struct
after testing had completed.

Signed-off-by: Owen Hilyard 
---
 app/test/test_cmdline_lib.c | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/app/test/test_cmdline_lib.c b/app/test/test_cmdline_lib.c
index bd72df0da..fd0a797c1 100644
--- a/app/test/test_cmdline_lib.c
+++ b/app/test/test_cmdline_lib.c
@@ -71,10 +71,12 @@ test_cmdline_parse_fns(void)
if (cmdline_complete(cl, "buffer", &i, NULL, sizeof(dst)) >= 0)
goto error;
 
+   cmdline_free(cl);
return 0;
 
 error:
printf("Error: function accepted null parameter!\n");
+   cmdline_free(cl);
return -1;
 }
 
@@ -140,32 +142,45 @@ static int
 test_cmdline_socket_fns(void)
 {
cmdline_parse_ctx_t ctx;
+   struct cmdline *cl;
 
-   if (cmdline_stdin_new(NULL, "prompt") != NULL)
+   cl = cmdline_stdin_new(NULL, "prompt");
+   if (cl != NULL)
goto error;
-   if (cmdline_stdin_new(&ctx, NULL) != NULL)
+   cl = cmdline_stdin_new(&ctx, NULL);
+   if (cl != NULL)
goto error;
-   if (cmdline_file_new(NULL, "prompt", "/dev/null") != NULL)
+   cl = cmdline_file_new(NULL, "prompt", "/dev/null");
+   if (cl != NULL)
goto error;
-   if (cmdline_file_new(&ctx, NULL, "/dev/null") != NULL)
+   cl = cmdline_file_new(&ctx, NULL, "/dev/null");
+   if (cl != NULL)
goto error;
-   if (cmdline_file_new(&ctx, "prompt", NULL) != NULL)
+   cl = cmdline_file_new(&ctx, "prompt", NULL);
+   if (cl != NULL)
goto error;
-   if (cmdline_file_new(&ctx, "prompt", "-/invalid/~/path") != NULL) {
+   cl = cmdline_file_new(&ctx, "prompt", "-/invalid/~/path");
+   if (cl != NULL) {
printf("Error: succeeded in opening invalid file for reading!");
+   cmdline_free(cl);
return -1;
}
-   if (cmdline_file_new(&ctx, "prompt", "/dev/null") == NULL) {
+   cl = cmdline_file_new(&ctx, "prompt", "/dev/null");
+   if (cl == NULL) {
printf("Error: failed to open /dev/null for reading!");
+   cmdline_free(cl);
return -1;
}
 
/* void functions */
cmdline_stdin_exit(NULL);
-
+   if (cl != NULL)
+   cmdline_free(cl);
return 0;
 error:
printf("Error: function accepted null parameter!\n");
+   if (cl != NULL)
+   cmdline_free(cl);
return -1;
 }
 
@@ -198,6 +213,7 @@ test_cmdline_fns(void)
cmdline_interact(NULL);
cmdline_quit(NULL);
 
+   cmdline_free(cl);
return 0;
 
 error:
-- 
2.30.2



Re: [dpdk-dev] [PATCH] tests/test_eal_flags: fix memory leak

2021-06-16 Thread David Marchand
On Wed, Jun 16, 2021 at 6:26 PM  wrote:
>
> From: Owen Hilyard 
>
> The directory steam was not closed when the hugepage action was
> HUGEPAGE_CHECK_EXISTS. This caused a memory leak in some parts of
> the unit tests.
>
> Signed-off-by: Owen Hilyard 

Reviewed-by: David Marchand 


-- 
David Marchand



Re: [dpdk-dev] [dpdk-stable] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64

2021-06-16 Thread Honnappa Nagarahalli


> 
> Hi, everyone
> 
> This patch can be closed with the following reasons.
> 
> > -邮件原件-
> > 发件人: dev  代表 Honnappa Nagarahalli
> > 发送时间: 2021年3月28日 9:00
> > 收件人: tho...@monjalon.net; Takeshi Yoshimura
> > 
> > 抄送: sta...@dpdk.org; dev@dpdk.org; olivier.m...@6wind.com;
> > chao...@linux.vnet.ibm.com; konstantin.anan...@intel.com; Jerin Jacob
> > ; nd ; nd
> 
> > 主题: Re: [dpdk-dev] [dpdk-stable] [PATCH] rte_ring: fix racy
> > dequeue/enqueue in ppc64
> >
> > 
> >
> > > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] rte_ring: fix racy
> > > dequeue/enqueue in ppc64
> > >
> > > No reply after more than 2 years.
> > > Unfortunately it is probably outdated now.
> > > Classified as "Changes Requested".
> > Looking at the code, I think this patch in fact fixes a bug.
> > Appreciate rebasing this patch.
> >
> > The problem is already fixed in '__rte_ring_move_cons_head' but needs
> > to be fixed in '__rte_ring_move_prod_head'.
> > This problem is fixed for C11 version due to acquire load of cons.tail
> > and prod.tail.
> 
> First, for consumer in dequeue:
> the reason for that adding a rmb in move_cons_head of “generic” is based on
> this patch:
> http://patches.dpdk.org/project/dpdk/patch/1552409933-45684-2-git-send-
> email-gavin...@arm.com/
> 
> SlotConsumer  
>  Producer
> 1 dequeue elements
> 2 
>  update prod_tail
> 3   load new prod_tail
> 4   check room is enough(n < entries)
> 
> Dequeue elements maybe before load updated prod_tail, so consumer can
> load incorrect elements value.
> For dequeue multiple consumers case, ‘rte_atomic32_cmpset’ with acquire
> and release order can prevent dequeue before load prod_tail, no extra rmb is
> needed.
> 
> Second, for single producer in enqueue:
> 
> SlotProducer  
>Consumer
> 1 enqueue elements(not commited)
> 2 
>  update
> consumer_tail
> 3   load new consumer_tail
> 4   check room is enough(n < entries)
> 5   enqueued elements is committed
> 
> Though enqueue elements maybe reorder before load consumer_tail, these
> elements will not be committed until ‘check’ has finished. So from load to
> write control dependency is reliable and rmb is not needed here.
> [1] https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf (page:15)
> 
> As a result, it is unnecessary to add a rmb for enqueue single producer due to
> control dependency. And this patch can be closed.
Thanks Feifei, I did not consider the control dependency from load to store 
which is reliable in my comments below.
Agree, we can reject this patch.

> 
> Best Regards
> Feifei
> >
> > >
> > >
> > > 17/07/2018 05:34, Jerin Jacob:
> > > > From: Takeshi Yoshimura 
> > > >
> > > > Cc: olivier.m...@6wind.com
> > > > Cc: chao...@linux.vnet.ibm.com
> > > > Cc: konstantin.anan...@intel.com
> > > >
> > > > >
> > > > > > Adding rte_smp_rmb() cause performance regression on non x86
> > > platforms.
> > > > > > Having said that, load-load barrier can be expressed very
> > > > > > well with C11 memory model. I guess ppc64 supports C11 memory
> model.
> > > > > > If so, Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y
> > for
> > > > > > ppc64 and check original issue?
> > > > >
> > > > > Yes, the performance regression happens on non-x86 with single
> > > > > producer/consumer.
> > > > > The average latency of an enqueue was increased from 21 nsec to
> > > > > 24 nsec in my simple experiment. But, I think it is worth it.
> > > >
> > > > That varies to machine to machine. What is the burst size etc.
> > > >
> > > > >
> > > > >
> > > > > I also tested C11 rte_ring, however, it caused the same race
> > > > > condition in
> > > ppc64.
> > > > > I tried to fix the C11 problem as well, but I also found the C11
> > > > > rte_ring had other potential incorrect choices of memory orders,
> > > > > which caused another race condition in ppc64.
> > > >
> > > > Does it happens on all ppc64 machines? Or on a specific machine?
> > > > Is following tests are passing on your system without the patch?
> > > > test/test/test_ring_perf.c
> > > > test/test/test_ring.c
> > > >
> > > > >
> > > > > For example,
> > > > > __ATOMIC_ACQUIRE is passed to __atomic_compare_exchange_n(),
> > but I
> > > > > am not sure why the load-acquire is used for the compare exchange.
> > > >
> > > > It correct as per C11 acquire and release semantics.
> > > >
> > > > > Also in update_tail, the pause can be called before the data
> > > > > copy because of ht->tail load without atomic_load_n.
> > > > >
> > > > > The memory order is simply difficult, so it might take a bit
> > > > >

Re: [dpdk-dev] [PATCH] tests/cmdline: fix memory leaks

2021-06-16 Thread David Marchand
On Wed, Jun 16, 2021 at 6:26 PM  wrote:
>
> From: Owen Hilyard 
>
> Fixes for a few memory leaks in the cmdline_autotest unit test.
>
> All of the leaks were related to not freeing the commandline struct
> after testing had completed.

We will need a Fixes: tag and Cc: stable.


>
> Signed-off-by: Owen Hilyard 
> ---
>  app/test/test_cmdline_lib.c | 32 
>  1 file changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/app/test/test_cmdline_lib.c b/app/test/test_cmdline_lib.c
> index bd72df0da..fd0a797c1 100644
> --- a/app/test/test_cmdline_lib.c
> +++ b/app/test/test_cmdline_lib.c
> @@ -71,10 +71,12 @@ test_cmdline_parse_fns(void)
> if (cmdline_complete(cl, "buffer", &i, NULL, sizeof(dst)) >= 0)
> goto error;
>
> +   cmdline_free(cl);
> return 0;
>
>  error:
> printf("Error: function accepted null parameter!\n");
> +   cmdline_free(cl);
> return -1;
>  }
>
> @@ -140,32 +142,45 @@ static int
>  test_cmdline_socket_fns(void)
>  {
> cmdline_parse_ctx_t ctx;
> +   struct cmdline *cl;
>
> -   if (cmdline_stdin_new(NULL, "prompt") != NULL)
> +   cl = cmdline_stdin_new(NULL, "prompt");
> +   if (cl != NULL)
> goto error;
> -   if (cmdline_stdin_new(&ctx, NULL) != NULL)
> +   cl = cmdline_stdin_new(&ctx, NULL);
> +   if (cl != NULL)
> goto error;
> -   if (cmdline_file_new(NULL, "prompt", "/dev/null") != NULL)
> +   cl = cmdline_file_new(NULL, "prompt", "/dev/null");
> +   if (cl != NULL)
> goto error;
> -   if (cmdline_file_new(&ctx, NULL, "/dev/null") != NULL)
> +   cl = cmdline_file_new(&ctx, NULL, "/dev/null");
> +   if (cl != NULL)
> goto error;
> -   if (cmdline_file_new(&ctx, "prompt", NULL) != NULL)
> +   cl = cmdline_file_new(&ctx, "prompt", NULL);
> +   if (cl != NULL)
> goto error;
> -   if (cmdline_file_new(&ctx, "prompt", "-/invalid/~/path") != NULL) {
> +   cl = cmdline_file_new(&ctx, "prompt", "-/invalid/~/path");
> +   if (cl != NULL) {
> printf("Error: succeeded in opening invalid file for 
> reading!");
> +   cmdline_free(cl);
> return -1;
> }
> -   if (cmdline_file_new(&ctx, "prompt", "/dev/null") == NULL) {
> +   cl = cmdline_file_new(&ctx, "prompt", "/dev/null");
> +   if (cl == NULL) {
> printf("Error: failed to open /dev/null for reading!");
> +   cmdline_free(cl);
> return -1;
> }
>
> /* void functions */
> cmdline_stdin_exit(NULL);
> -
> +   if (cl != NULL)
> +   cmdline_free(cl);

We made sure cl != NULL above, no need for this test.


> return 0;
>  error:
> printf("Error: function accepted null parameter!\n");
> +   if (cl != NULL)
> +   cmdline_free(cl);

And anyway, cmdline_free() handles a NULL pointer fine, so you can
also remove the check here.

With commitlog updated and those two changes, you can add my:
Reviewed-by: David Marchand 

> return -1;
>  }
>
> @@ -198,6 +213,7 @@ test_cmdline_fns(void)
> cmdline_interact(NULL);
> cmdline_quit(NULL);
>
> +   cmdline_free(cl);
> return 0;
>
>  error:
> --
> 2.30.2
>


-- 
David Marchand



Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Honnappa Nagarahalli


> 
> On Wed, Jun 16, 2021 at 02:14:54PM +0200, David Marchand wrote:
> > On Tue, Jun 15, 2021 at 3:25 PM Chengwen Feng
>  wrote:
> > > +
> > > +#define RTE_DMADEV_NAME_MAX_LEN(64)
> > > +/**< @internal Max length of name of DMA PMD */
> > > +
> > > +/** @internal
> > > + * The data structure associated with each DMA device.
> > > + */
> > > +struct rte_dmadev {
> > > +   /**< Device ID for this instance */
> > > +   uint16_t dev_id;
> > > +   /**< Functions exported by PMD */
> > > +   const struct rte_dmadev_ops *dev_ops;
> > > +   /**< Device info. supplied during device initialization */
> > > +   struct rte_device *device;
> > > +   /**< Driver info. supplied by probing */
> > > +   const char *driver_name;
> > > +
> > > +   /**< Device name */
> > > +   char name[RTE_DMADEV_NAME_MAX_LEN]; } __rte_cache_aligned;
> > > +
> >
> > I see no queue/channel notion.
> > How does a rte_dmadev object relate to a physical hw engine?
> >
> One queue, one device.
> When looking to update the ioat driver for 20.11 release when I added the
> idxd part, I considered adding a queue parameter to the API to look like one
> device with multiple queues. However, since each queue acts completely
> independently of each other, there was no benefit to doing so. It's just 
> easier
> to have a single id to identify a device queue.
Does it mean, the queue is multi thread safe? Do we need queues per core to 
avoid locking?


Re: [dpdk-dev] [PATCH] tests/test_eal_flags: fix memory leak

2021-06-16 Thread David Marchand
On Wed, Jun 16, 2021 at 6:37 PM David Marchand
 wrote:
>
> On Wed, Jun 16, 2021 at 6:26 PM  wrote:
> >
> > From: Owen Hilyard 
> >
> > The directory steam was not closed when the hugepage action was
> > HUGEPAGE_CHECK_EXISTS. This caused a memory leak in some parts of
> > the unit tests.

Just forgot to ask for a Fixes: tag.
I guess this is:
Fixes: 45f1b6e8680a ("app: add new tests on eal flags")

I will add it when applying.

-- 
David Marchand



Re: [dpdk-dev] [PATCH] lib/rte_rib6: fix stack buffer overflow

2021-06-16 Thread Stephen Hemminger
On Wed, 16 Jun 2021 12:07:29 -0400
ohily...@iol.unh.edu wrote:

> From: Owen Hilyard 
> 
> ASAN found a stack buffer overflow in lib/rib/rte_rib6.c:get_dir.
> The fix for the stack buffer overflow was to make sure depth
> was always < 128, since when depth = 128 it caused the index
> into the ip address to be 16, which read off the end of the array.
> 
> While trying to solve the buffer overflow, I noticed that a few
> changes could be made to remove the for loop entirely.
> 
> Signed-off-by: Owen Hilyard 
> ---
>  lib/rib/rte_rib6.c | 22 ++
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/rib/rte_rib6.c b/lib/rib/rte_rib6.c
> index f6c55ee45..2de50449d 100644
> --- a/lib/rib/rte_rib6.c
> +++ b/lib/rib/rte_rib6.c
> @@ -79,14 +79,20 @@ is_covered(const uint8_t ip1[RTE_RIB6_IPV6_ADDR_SIZE],
>  static inline int
>  get_dir(const uint8_t ip[RTE_RIB6_IPV6_ADDR_SIZE], uint8_t depth)
>  {
> - int i = 0;
> - uint8_t p_depth, msk;
> -
> - for (p_depth = depth; p_depth >= 8; p_depth -= 8)
> - i++;
> -
> - msk = 1 << (7 - p_depth);
> - return (ip[i] & msk) != 0;
> + int index, msk;
> + /* depth & 127 clamps depth to values that will not

Please put blank line after declarations for clarity.
Since index and mask are not signed values, please make them unsigned.
Better yet, make them sized to the appropriate number of bits.

> +  * read off the end of ip.
> +  * depth is the number of bits deep into ip to traverse, and
> +  * is incremented in blocks of 8 (1 byte). This means the last
> +  * 3 bits are irrelevant to what the index of ip should be.
> +  */
> + index = (depth & 127) >> 3;
> + /*
> +  * msk is the bitmask used to extract the bit used to decide the
> +  * direction of the next step of the binary search.
> +  */
> + msk = 1 << (7 - (depth & 7));
> + return (ip[index] & msk) != 0;
>  }
>  
>  static inline struct rte_rib6_node *



[dpdk-dev] [Bug 739] Building DPDK with gcc 10.2.1 Generates Multiple Warning Messages

2021-06-16 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=739

Bug ID: 739
   Summary: Building DPDK with gcc 10.2.1 Generates Multiple
Warning Messages
   Product: DPDK
   Version: unspecified
  Hardware: POWER
OS: Linux
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: other
  Assignee: dev@dpdk.org
  Reporter: d...@linux.vnet.ibm.com
  Target Milestone: ---

Created attachment 162
  --> https://bugs.dpdk.org/attachment.cgi?id=162&action=edit
Full Build Output Including Warnings

Building DPDK 21.08-rc0 with gcc 10.2.1 on RHEL 8.3 generates multiple
warnings, though the final executables appear to work correctly. Examples of
such warnings include:

[115/2415] Compiling C object
'lib/76b5a35@@rte_eal@sta/eal_common_eal_common_trace_utils.c.o'.
../lib/eal/common/eal_common_trace_utils.c: In function
‘trace_epoch_time_save’:
../lib/eal/common/eal_common_trace_utils.c:284:22: note: the layout of
aggregates containing vectors with 8-byte alignment has changed in GCC 5
  284 |  trace->uptime_ticks = avg;
  |  ^
[225/2415] Compiling C object 'lib/76b5a35@@rte_acl@sta/acl_acl_gen.c.o'.
../lib/acl/acl_gen.c: In function ‘acl_count_trie_types.constprop’:
../lib/acl/acl_gen.c:220:1: note: the layout of aggregates containing vectors
with 4-byte alignment has changed in GCC 5
  220 | acl_count_trie_types(struct acl_node_counters *counts,
  | ^~~~

...

[320/2415] Compiling C object 'lib/76b5a35@@rte_hash@sta/hash_rte_thash.c.o'.
In file included from ../lib/mempool/rte_mempool.h:51,
 from ../lib/mbuf/rte_mbuf.h:38,
 from ../lib/net/rte_ip.h:31,
 from ../lib/hash/rte_thash.h:29,
 from ../lib/hash/rte_thash.c:5:
In function ‘rte_memcpy_func’,
inlined from ‘rte_thash_init_ctx’ at ../lib/hash/rte_thash.c:232:3:
../lib/eal/ppc/include/rte_memcpy.h:50:2: warning: writing 16 bytes into a
region of size 0 [-Wstringop-overflow=]
   50 |  vec_vsx_st(vec_vsx_ld(0, src), 0, dst);
  |  ^~

...

[323/2415] Compiling C object 'lib/76b5a35@@rte_lpm@sta/lpm_rte_lpm.c.o'.
../lib/lpm/rte_lpm.c: In function ‘rte_lpm_create’:
../lib/lpm/rte_lpm.c:240:19: note: the layout of aggregates containing vectors
with 8-byte alignment has changed in GCC 5
  240 |  i_lpm->max_rules = config->max_rules;
  |  ~^~~
[328/2415] Compiling C object
'drivers/a715181@@tmp_rte_net_bnxt@sta/net_bnxt_tf_ulp_bnxt_ulp_flow.c.o'.
../drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c: In function
‘bnxt_ulp_init_mapper_params’:
../drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c:78:1: note: the layout of aggregates
containing vectors with 8-byte alignment has changed in GCC 5
   78 | bnxt_ulp_init_mapper_params(struct bnxt_ulp_mapper_create_parms
*mapper_cparms,
  | ^~~

...

[1153/2415] Compiling C object
'drivers/a715181@@tmp_rte_net_ena@sta/net_ena_ena_ethdev.c.o'.
In file included from ../lib/net/rte_ether.h:21,
 from ../drivers/net/ena/ena_ethdev.c:7:
../drivers/net/ena/ena_ethdev.c: In function ‘ena_rss_key_fill’:
../lib/eal/ppc/include/rte_memcpy.h:53:2: warning: array subscript 3 is outside
array bounds of ‘uint8_t[40]’ {aka ‘unsigned char[40]’} [-Warray-bounds]
   53 |  vec_vsx_st(vec_vsx_ld(48, src), 48, dst);
  |  ^~
../drivers/net/ena/ena_ethdev.c:277:17: note: while referencing ‘default_key’
  277 |  static uint8_t default_key[ENA_HASH_KEY_SIZE];
  | ^~~
In file included from ../lib/net/rte_ether.h:21,
 from ../drivers/net/ena/ena_ethdev.c:7:
../lib/eal/ppc/include/rte_memcpy.h:53:2: warning: array subscript [3, 7] is
outside array bounds of ‘uint8_t[40]’ {aka ‘unsigned char[40]’}
[-Warray-bounds]
   53 |  vec_vsx_st(vec_vsx_ld(48, src), 48, dst);
  |  ^~
../drivers/net/ena/ena_ethdev.c:277:17: note: while referencing ‘default_key’
  277 |  static uint8_t default_key[ENA_HASH_KEY_SIZE];
  | ^~~

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [dpdk-dev] [PATCH v1] net/i40e: fix flow director does not work

2021-06-16 Thread Thomas Monjalon
01/06/2021 13:12, Zhang, Qi Z:
> > > When user configured the flow rule with raw packet via command
> > > "flow_director_filter", it would reset all previous fdir input set
> > > flags with "i40e_flow_set_fdir_inset()".
> > >
> > > Ignore to configure the flow input set with raw packet rule used.
> > >
> > > Fixes: ff04964ea6d5 ("net/i40e: fix flow director for common pctypes")
> > >
> > > Signed-off-by: Steve Yang 
> > 
> > Acked-by: Beilei Xing 
> 
> Applied to dpdk-next-net-intel.

Why Cc:stable is not added?




[dpdk-dev] [Bug 740] event/cnxk/cnxk_tim_worker build failures on Ubuntu 20.04 cross-compile with clang 10

2021-06-16 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=740

Bug ID: 740
   Summary: event/cnxk/cnxk_tim_worker build failures on Ubuntu
20.04 cross-compile with clang 10
   Product: DPDK
   Version: unspecified
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: b...@iol.unh.edu
  Target Milestone: ---

Cross-compiling DPDK on Ubuntu 20.04 with clang 10.0.0-4ubuntu1 on latest main
(46c451d905e8e27787a0471ecc1d31a4cde25a9e)  

$ CC_FOR_BUILD=clang meson build --werror --cross-file
config/arm/arm64_armv8_linux_clang_ubuntu1804 -Dexamples=all --default-library
shared  

The Meson build system
Version: 0.58.0

Found ninja-1.10.0 at /usr/bin/ninja

[1901/2540] Compiling C object
drivers/libtmp_rte_event_cnxk.a.p/event_cnxk_cnxk_tim_worker.c.o  
FAILED: drivers/libtmp_rte_event_cnxk.a.p/event_cnxk_cnxk_tim_worker.c.o 
clang -Idrivers/libtmp_rte_event_cnxk.a.p -Idrivers -I../drivers
-Idrivers/event/cnxk -I../drivers/event/cnxk -Ilib/eventdev -I../lib/eventdev
-I. -I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include
-Ilib/eal/linux/include -I../lib/eal/linux/include -Ilib/eal/arm/include
-I../lib/eal/arm/include -Ilib/eal/common -I../lib/eal/common -Ilib/eal
-I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/metrics -I../lib/metrics
-Ilib/telemetry -I../lib/telemetry -Ilib/ring -I../lib/ring -Ilib/ethdev
-I../lib/ethdev -Ilib/net -I../lib/net -Ilib/mbuf -I../lib/mbuf -Ilib/mempool
-I../lib/mempool -Ilib/meter -I../lib/meter -Ilib/hash -I../lib/hash -Ilib/rcu
-I../lib/rcu -Ilib/timer -I../lib/timer -Ilib/cryptodev -I../lib/cryptodev
-Idrivers/bus/pci -I../drivers/bus/pci -I../drivers/bus/pci/linux -Ilib/pci
-I../lib/pci -Idrivers/common/cnxk -I../drivers/common/cnxk -fcolor-diagnostics
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O3 -include rte_config.h
-Wextra -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral -Wformat-security
-Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes
-Wundef -Wwrite-strings -Wno-address-of-packed-member
-Wno-missing-field-initializers -D_GNU_SOURCE -target aarch64-linux-gnu
--sysroot /usr/aarch64-linux-gnu -fPIC -march=armv8-a+crc
-DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API
-DRTE_LOG_DEFAULT_LOGTYPE=pmd.event.cnxk -MD -MQ
drivers/libtmp_rte_event_cnxk.a.p/event_cnxk_cnxk_tim_worker.c.o -MF
drivers/libtmp_rte_event_cnxk.a.p/event_cnxk_cnxk_tim_worker.c.o.d -o
drivers/libtmp_rte_event_cnxk.a.p/event_cnxk_cnxk_tim_worker.c.o -c
../drivers/event/cnxk/cnxk_tim_worker.c
In file included from ../drivers/event/cnxk/cnxk_tim_worker.c:6:
../drivers/event/cnxk/cnxk_tim_worker.h:372:23: error: value size does not
match register size specified by the constraint and modifier
[-Werror,-Wasm-operand-widths]
 : [rem] "=&r"(rem)
   ^
../drivers/event/cnxk/cnxk_tim_worker.h:365:17: note: use constraint modifier
"w"
 "  ldxr %[rem], [%[crem]]  \n"
 ^~
 %w[rem]
../drivers/event/cnxk/cnxk_tim_worker.h:372:23: error: value size does not
match register size specified by the constraint and modifier
[-Werror,-Wasm-operand-widths]
 : [rem] "=&r"(rem)
   ^
../drivers/event/cnxk/cnxk_tim_worker.h:366:16: note: use constraint modifier
"w"
 "  tbz %[rem], 63, dne%=   \n"
^~
%w[rem]
../drivers/event/cnxk/cnxk_tim_worker.h:372:23: error: value size does not
match register size specified by the constraint and modifier
[-Werror,-Wasm-operand-widths]
 : [rem] "=&r"(rem)
   ^
../drivers/event/cnxk/cnxk_tim_worker.h:369:17: note: use constraint modifier
"w"
 "  ldxr %[rem], [%[crem]]  \n"
 ^~
 %w[rem]
../drivers/event/cnxk/cnxk_tim_worker.h:372:23: error: value size does not
match register size specified by the constraint and modifier
[-Werror,-Wasm-operand-widths]
 : [rem] "=&r"(rem)
   ^
../drivers/event/cnxk/cnxk_tim_worker.h:370:17: note: use constraint modifier
"w"
 "  tbnz %[rem], 63, rty%=  \n"
 ^~
 %w[rem]
4 errors generated.
[1918/2540] Compiling C object
drivers/libtmp_rte_event_octeontx2.a.p/event_octeontx2_otx2_worker.c.o

-- 

Re: [dpdk-dev] [PATCH] lib/rte_rib6: fix stack buffer overflow

2021-06-16 Thread Medvedkin, Vladimir

Hi Owen,

Thanks for the fix.

I like your solution with removing the loop. However, while this fixes 
the buffer overflow, IMO it is not complete, because get_dir() shouldn't 
be called in cases where depth = 128. In this case checking the MSB of 
the ip is not quite right thing.
The only place where it is possible (depth == 128) is on calling 
get_nxt_node() from rte_rib6_lookup(), so I would suggest adding 
something like this:


if (depth == 128)
return NULL;

to get_nxt_node() along with your changes.

Also, apart from Stephen's comments, please add the corresponding 
fixline to the v2.


Thanks!


On 16/06/2021 19:56, Stephen Hemminger wrote:

On Wed, 16 Jun 2021 12:07:29 -0400
ohily...@iol.unh.edu wrote:


From: Owen Hilyard 

ASAN found a stack buffer overflow in lib/rib/rte_rib6.c:get_dir.
The fix for the stack buffer overflow was to make sure depth
was always < 128, since when depth = 128 it caused the index
into the ip address to be 16, which read off the end of the array.

While trying to solve the buffer overflow, I noticed that a few
changes could be made to remove the for loop entirely.

Signed-off-by: Owen Hilyard 
---
  lib/rib/rte_rib6.c | 22 ++
  1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/lib/rib/rte_rib6.c b/lib/rib/rte_rib6.c
index f6c55ee45..2de50449d 100644
--- a/lib/rib/rte_rib6.c
+++ b/lib/rib/rte_rib6.c
@@ -79,14 +79,20 @@ is_covered(const uint8_t ip1[RTE_RIB6_IPV6_ADDR_SIZE],
  static inline int
  get_dir(const uint8_t ip[RTE_RIB6_IPV6_ADDR_SIZE], uint8_t depth)
  {
-   int i = 0;
-   uint8_t p_depth, msk;
-
-   for (p_depth = depth; p_depth >= 8; p_depth -= 8)
-   i++;
-
-   msk = 1 << (7 - p_depth);
-   return (ip[i] & msk) != 0;
+   int index, msk;
+   /* depth & 127 clamps depth to values that will not


Please put blank line after declarations for clarity.
Since index and mask are not signed values, please make them unsigned.
Better yet, make them sized to the appropriate number of bits.


+* read off the end of ip.
+* depth is the number of bits deep into ip to traverse, and
+* is incremented in blocks of 8 (1 byte). This means the last
+* 3 bits are irrelevant to what the index of ip should be.
+*/
+   index = (depth & 127) >> 3;
+   /*
+* msk is the bitmask used to extract the bit used to decide the
+* direction of the next step of the binary search.
+*/
+   msk = 1 << (7 - (depth & 7));
+   return (ip[index] & msk) != 0;
  }
  
  static inline struct rte_rib6_node *




--
Regards,
Vladimir


Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Bruce Richardson
On Wed, Jun 16, 2021 at 05:41:45PM +0800, fengchengwen wrote:
> On 2021/6/16 0:38, Bruce Richardson wrote:
> > On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> >> This patch introduces 'dmadevice' which is a generic type of DMA
> >> device.
> >>
> >> The APIs of dmadev library exposes some generic operations which can
> >> enable configuration and I/O with the DMA devices.
> >>
> >> Signed-off-by: Chengwen Feng 
> >> ---
> > Thanks for sending this.
> > 
> > Of most interest to me right now are the key data-plane APIs. While we are
> > still in the prototyping phase, below is a draft of what we are thinking
> > for the key enqueue/perform_ops/completed_ops APIs.
> > 
> > Some key differences I note in below vs your original RFC:
> > * Use of void pointers rather than iova addresses. While using iova's makes
> >   sense in the general case when using hardware, in that it can work with
> >   both physical addresses and virtual addresses, if we change the APIs to 
> > use
> >   void pointers instead it will still work for DPDK in VA mode, while at the
> >   same time allow use of software fallbacks in error cases, and also a stub
> >   driver than uses memcpy in the background. Finally, using iova's makes the
> >   APIs a lot more awkward to use with anything but mbufs or similar buffers
> >   where we already have a pre-computed physical address.
> 
> The iova is an hint to application, and widely used in DPDK.
> If switch to void, how to pass the address (iova or just va ?)
> this may introduce implementation dependencies here.
> 
> Or always pass the va, and the driver performs address translation, and this
> translation may cost too much cpu I think.
> 

On the latter point, about driver doing address translation I would agree.
However, we probably need more discussion about the use of iova vs just
virtual addresses. My thinking on this is that if we specify the API using
iovas it will severely hurt usability of the API, since it forces the user
to take more inefficient codepaths in a large number of cases. Given a
pointer to the middle of an mbuf, one cannot just pass that straight as an
iova but must instead do a translation into offset from mbuf pointer and
then readd the offset to the mbuf base address.

My preference therefore is to require the use of an IOMMU when using a
dmadev, so that it can be a much closer analog of memcpy. Once an iommu is
present, DPDK will run in VA mode, allowing virtual addresses to our
hugepage memory to be sent directly to hardware. Also, when using
dmadevs on top of an in-kernel driver, that kernel driver may do all iommu
management for the app, removing further the restrictions on what memory
can be addressed by hardware.

> > * Use of id values rather than user-provided handles. Allowing the user/app
> >   to manage the amount of data stored per operation is a better solution, I
> >   feel than proscribing a certain about of in-driver tracking. Some apps may
> >   not care about anything other than a job being completed, while other apps
> >   may have significant metadata to be tracked. Taking the user-context
> >   handles out of the API also makes the driver code simpler.
> 
> The user-provided handle was mainly used to simply application implementation,
> It provides the ability to quickly locate contexts.
> 
> The "use of id values" seem like the dma_cookie of Linux DMA engine framework,
> user will get a unique dma_cookie after calling dmaengine_submit(), and then
> could use it to call dma_async_is_tx_complete() to get completion status.
> 

Yes, the idea of the id is the same - to locate contexts. The main
difference is that if we have the driver manage contexts or pointer to
contexts, as well as giving more work to the driver, it complicates the APIs
for measuring completions. If we use an ID-based approach, where the app
maintains its own ring of contexts (if any), it avoids the need to have an
"out" parameter array for returning those contexts, which needs to be
appropriately sized. Instead we can just report that all ids up to N are
completed. [This would be similar to your suggestion that N jobs be
reported as done, in that no contexts are provided, it's just that knowing
the ID of what is completed is generally more useful than the number (which
can be obviously got by subtracting the old value)]

We are still working on prototyping all this, but would hope to have a
functional example of all this soon.

> How about define the copy prototype as following:
>   dma_cookie_t rte_dmadev_copy(uint16_t dev_id, xxx)
> while the dma_cookie_t is int32 and is monotonically increasing, when >=0 mean
> enqueue successful else fail.
> when complete the dmadev will return latest completed dma_cookie, and the
> application could use the dma_cookie to quick locate contexts.
> 

If I understand this correctly, I believe this is largely what I was
suggesting - just with the typedef for the type? In which case it obviously
looks good to me.

> > * I've kept

Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-16 Thread Bruce Richardson
On Wed, Jun 16, 2021 at 05:01:46PM +0200, Morten Brørup wrote:
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Wednesday, 16 June 2021 15.03
> > 
> > On Wed, Jun 16, 2021 at 01:27:17PM +0200, Morten Brørup wrote:
> > > > From: Jerin Jacob [mailto:jerinjac...@gmail.com]
> > > > Sent: Wednesday, 16 June 2021 11.42
> > > >
> > > > On Tue, Jun 15, 2021 at 12:18 PM Thomas Monjalon
> > 
> > > > wrote:
> > > > >
> > > > > 14/06/2021 17:48, Morten Brørup:
> > > > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> > > > Monjalon
> > > > > > It would be much simpler to just increase RTE_MAX_ETHPORTS to
> > > > something big enough to hold a sufficiently large array. And
> > possibly
> > > > add an rte_max_ethports variable to indicate the number of
> > populated
> > > > entries in the array, for use when iterating over the array.
> > > > > >
> > > > > > Can we come up with another example than RTE_MAX_ETHPORTS where
> > > > this library provides a better benefit?
> > > > >
> > > > > What is big enough?
> > > > > Is 640KB enough for RAM? ;)
> > > >
> > > > If I understand it correctly, Linux process allocates 640KB due to
> > > > that fact currently
> > > > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] is global and
> > it
> > > > is from BSS.
> > >
> > > Correct.
> > >
> > > > If we make this from heap i.e use malloc() to allocate this memory
> > > > then in my understanding Linux
> > > > really won't allocate the real page for backend memory until
> > unless,
> > > > someone write/read to this memory.
> > >
> > > If the array is allocated from the heap, its members will be accessed
> > though a pointer to the array, e.g. in rte_eth_rx/tx_burst(). This
> > might affect performance, which is probably why the array is allocated
> > the way it is.
> > >
> > 
> > It depends on whether the array contains pointers to malloced elements
> > or
> > the array itself is just a single malloced array of all the structures.
> > While I think the parray proposal referred to the former - which would
> > have
> > an extra level of indirection - the switch we are discussing here is
> > the
> > latter which should have no performance difference, since the method of
> > accessing the elements will be the same, only with the base address
> > pointing to a different area of memory.
> 
> I was not talking about an array of pointers. And it is not the same:
> 
> int arr[27];
> int * parr = arr;
> 
> // direct access
> int dir(int i) { return arr[i]; }
> 
> // indirect access
> int indir(int i) { return parr[i]; }
> 
> The direct access knows the address of arr, so it will compile to:
> movsx   rdi, edi
> mov eax, DWORD PTR arr[0+rdi*4]
> ret
> 
> The indirect access needs to first read the memory location holding the 
> pointer to the array, and then it can read the array member, so it will 
> compile to:
> mov rax, QWORD PTR parr[rip]
> movsx   rdi, edi
> mov eax, DWORD PTR [rax+rdi*4]
> ret
> 
Interesting, thanks. Definitely seems like a bit of perf testing will be
needed whatever way we go.


Re: [dpdk-dev] Memory leak in rte_pci_scan

2021-06-16 Thread David Marchand
On Wed, Jun 16, 2021 at 6:27 PM Owen Hilyard  wrote:
>> - For the fast-tests testsuite, the default timeout should be 10s, not 600s.
>> See timeout_seconds_fast,
>> https://git.dpdk.org/dpdk/tree/app/test/meson.build#n446
>> Odd that a 600s timeout has been applied to fast-tests in your run.
>> How do you invoke meson?
>
>
> # meson test -t 600
>
> I copied the invocation from the production scripts for the community lab and 
> removed the --suite argument.

600?
-t is for timeout multiplier.
The default timeout for fast tests is 10s and the logs in a previous
mail show 600s for timeout, so I would expect a 60 multiplier.


>
>> It seems like there are multiple dpdk processes running in // in this
>> environment.
>> Any idea of what is happening on your system at the moment you tried
>> to run this test?
>
>
> I ran this on a VM that we keep in the same state as the production container 
> runners. It is not attached to our Jenkins instance, and I was the only 
> logged-in user.  I re-ran the test suite with and without ASAN, and it seems 
> like this type of failure only happens when ASAN is active. The failing tests 
> are: eal_flags_a_opt_autotest, eal_flags_b_opt_autotest, 
> eal_flags_c_opt_autotest, eal_flags_main_opt_autotest, 
> eal_flags_misc_autotest. I've attached the log.

ASAN seems to break some assumption on the default virtual base
address used by the mp stuff.
It might be a reason for the secondary process init failure.

Still, we have probably a deadlock here, since the test should fail in
a reasonable amount of time.

My guess would be at some secondary process not releasing a lock and
the primary ends up waiting on it.
Here, a secondary process did not initialise correctly, but it tried
to cleanup afterwards... per chance, do you have a crash reported in
syslog?


-- 
David Marchand



[dpdk-dev] [PATCH] tests/cmdline: fix memory leaks

2021-06-16 Thread ohilyard
From: Owen Hilyard 

Fixes for a few memory leaks in the cmdline_autotest unit test.

All of the leaks were related to not freeing the commandline struct
after testing had completed.

Fixes: dbb860e03e ("cmdline: tests")

Signed-off-by: Owen Hilyard 
Reviewed-by: David Marchand 
---
 app/test/test_cmdline_lib.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/app/test/test_cmdline_lib.c b/app/test/test_cmdline_lib.c
index bd72df0da..19228c9a5 100644
--- a/app/test/test_cmdline_lib.c
+++ b/app/test/test_cmdline_lib.c
@@ -71,10 +71,12 @@ test_cmdline_parse_fns(void)
if (cmdline_complete(cl, "buffer", &i, NULL, sizeof(dst)) >= 0)
goto error;
 
+   cmdline_free(cl);
return 0;
 
 error:
printf("Error: function accepted null parameter!\n");
+   cmdline_free(cl);
return -1;
 }
 
@@ -140,32 +142,43 @@ static int
 test_cmdline_socket_fns(void)
 {
cmdline_parse_ctx_t ctx;
+   struct cmdline *cl;
 
-   if (cmdline_stdin_new(NULL, "prompt") != NULL)
+   cl = cmdline_stdin_new(NULL, "prompt");
+   if (cl != NULL)
goto error;
-   if (cmdline_stdin_new(&ctx, NULL) != NULL)
+   cl = cmdline_stdin_new(&ctx, NULL);
+   if (cl != NULL)
goto error;
-   if (cmdline_file_new(NULL, "prompt", "/dev/null") != NULL)
+   cl = cmdline_file_new(NULL, "prompt", "/dev/null");
+   if (cl != NULL)
goto error;
-   if (cmdline_file_new(&ctx, NULL, "/dev/null") != NULL)
+   cl = cmdline_file_new(&ctx, NULL, "/dev/null");
+   if (cl != NULL)
goto error;
-   if (cmdline_file_new(&ctx, "prompt", NULL) != NULL)
+   cl = cmdline_file_new(&ctx, "prompt", NULL);
+   if (cl != NULL)
goto error;
-   if (cmdline_file_new(&ctx, "prompt", "-/invalid/~/path") != NULL) {
+   cl = cmdline_file_new(&ctx, "prompt", "-/invalid/~/path");
+   if (cl != NULL) {
printf("Error: succeeded in opening invalid file for reading!");
+   cmdline_free(cl);
return -1;
}
-   if (cmdline_file_new(&ctx, "prompt", "/dev/null") == NULL) {
+   cl = cmdline_file_new(&ctx, "prompt", "/dev/null");
+   if (cl == NULL) {
printf("Error: failed to open /dev/null for reading!");
+   cmdline_free(cl);
return -1;
}
 
/* void functions */
cmdline_stdin_exit(NULL);
-
+   cmdline_free(cl);
return 0;
 error:
printf("Error: function accepted null parameter!\n");
+   cmdline_free(cl);
return -1;
 }
 
@@ -198,6 +211,7 @@ test_cmdline_fns(void)
cmdline_interact(NULL);
cmdline_quit(NULL);
 
+   cmdline_free(cl);
return 0;
 
 error:
-- 
2.30.2



Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Jerin Jacob
On Wed, Jun 16, 2021 at 11:01 PM Bruce Richardson
 wrote:
>
> On Wed, Jun 16, 2021 at 05:41:45PM +0800, fengchengwen wrote:
> > On 2021/6/16 0:38, Bruce Richardson wrote:
> > > On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> > >> This patch introduces 'dmadevice' which is a generic type of DMA
> > >> device.
> > >>
> > >> The APIs of dmadev library exposes some generic operations which can
> > >> enable configuration and I/O with the DMA devices.
> > >>
> > >> Signed-off-by: Chengwen Feng 
> > >> ---
> > > Thanks for sending this.
> > >
> > > Of most interest to me right now are the key data-plane APIs. While we are
> > > still in the prototyping phase, below is a draft of what we are thinking
> > > for the key enqueue/perform_ops/completed_ops APIs.
> > >
> > > Some key differences I note in below vs your original RFC:
> > > * Use of void pointers rather than iova addresses. While using iova's 
> > > makes
> > >   sense in the general case when using hardware, in that it can work with
> > >   both physical addresses and virtual addresses, if we change the APIs to 
> > > use
> > >   void pointers instead it will still work for DPDK in VA mode, while at 
> > > the
> > >   same time allow use of software fallbacks in error cases, and also a 
> > > stub
> > >   driver than uses memcpy in the background. Finally, using iova's makes 
> > > the
> > >   APIs a lot more awkward to use with anything but mbufs or similar 
> > > buffers
> > >   where we already have a pre-computed physical address.
> >
> > The iova is an hint to application, and widely used in DPDK.
> > If switch to void, how to pass the address (iova or just va ?)
> > this may introduce implementation dependencies here.
> >
> > Or always pass the va, and the driver performs address translation, and this
> > translation may cost too much cpu I think.
> >
>
> On the latter point, about driver doing address translation I would agree.
> However, we probably need more discussion about the use of iova vs just
> virtual addresses. My thinking on this is that if we specify the API using
> iovas it will severely hurt usability of the API, since it forces the user
> to take more inefficient codepaths in a large number of cases. Given a
> pointer to the middle of an mbuf, one cannot just pass that straight as an
> iova but must instead do a translation into offset from mbuf pointer and
> then readd the offset to the mbuf base address.
>
> My preference therefore is to require the use of an IOMMU when using a
> dmadev, so that it can be a much closer analog of memcpy. Once an iommu is
> present, DPDK will run in VA mode, allowing virtual addresses to our
> hugepage memory to be sent directly to hardware. Also, when using
> dmadevs on top of an in-kernel driver, that kernel driver may do all iommu
> management for the app, removing further the restrictions on what memory
> can be addressed by hardware.


One issue of keeping void * is that memory can come from stack or heap .
which HW can not really operate it on.  Considering difficulty to
expressing above constraints,
IMO, iova is good. (So that contract is clear between driver and
application) or have some other
means to express that constrain.


[dpdk-dev] [PATCH v2] lib/rte_rib6: fix stack buffer overflow

2021-06-16 Thread ohilyard
From: Owen Hilyard 

ASAN found a stack buffer overflow in lib/rib/rte_rib6.c:get_dir.
The fix for the stack buffer overflow was to make sure depth
was always < 128, since when depth = 128 it caused the index
into the ip address to be 16, which read off the end of the array.

While trying to solve the buffer overflow, I noticed that a few
changes could be made to remove the for loop entirely.

Fixes: f7e861e21c ("rib: support IPv6")

Signed-off-by: Owen Hilyard 
---
 lib/rib/rte_rib6.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/lib/rib/rte_rib6.c b/lib/rib/rte_rib6.c
index f6c55ee45..a4daf12ca 100644
--- a/lib/rib/rte_rib6.c
+++ b/lib/rib/rte_rib6.c
@@ -79,20 +79,31 @@ is_covered(const uint8_t ip1[RTE_RIB6_IPV6_ADDR_SIZE],
 static inline int
 get_dir(const uint8_t ip[RTE_RIB6_IPV6_ADDR_SIZE], uint8_t depth)
 {
-   int i = 0;
-   uint8_t p_depth, msk;
-
-   for (p_depth = depth; p_depth >= 8; p_depth -= 8)
-   i++;
-
-   msk = 1 << (7 - p_depth);
-   return (ip[i] & msk) != 0;
+   uint8_t index, msk;
+
+   /* depth & 127 clamps depth to values that will not
+* read off the end of ip.
+* depth is the number of bits deep into ip to traverse, and
+* is incremented in blocks of 8 (1 byte). This means the last
+* 3 bits are irrelevant to what the index of ip should be.
+*/
+   index = (depth & 127) >> 3;
+
+   /*
+* msk is the bitmask used to extract the bit used to decide the
+* direction of the next step of the binary search.
+*/
+   msk = 1 << (7 - (depth & 7));
+
+   return (ip[index] & msk) != 0;
 }
 
 static inline struct rte_rib6_node *
 get_nxt_node(struct rte_rib6_node *node,
const uint8_t ip[RTE_RIB6_IPV6_ADDR_SIZE])
 {
+   if (node->depth == 128)
+   return NULL;
return (get_dir(ip, node->depth)) ? node->right : node->left;
 }
 
-- 
2.30.2



[dpdk-dev] [PATCH] net/mlx5: do not allow copy to mark via modify field

2021-06-16 Thread Alexander Kozyrev
Mark requires a tag resource to be registered as part of
the value assigning. It is not possible during a copy
operation from a packet field. Forbid this in MODIFY_FIELD.

Signed-off-by: Alexander Kozyrev 
---
 drivers/net/mlx5/mlx5_flow_dv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index dafd37ab93..26b901e32e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -4797,10 +4797,11 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev 
*dev,
"source and destination fields"
" cannot be the same");
if (action_modify_field->dst.field == RTE_FLOW_FIELD_VALUE ||
-   action_modify_field->dst.field == RTE_FLOW_FIELD_POINTER)
+   action_modify_field->dst.field == RTE_FLOW_FIELD_POINTER ||
+   action_modify_field->dst.field == RTE_FLOW_FIELD_MARK)
return rte_flow_error_set(error, EINVAL,
RTE_FLOW_ERROR_TYPE_ACTION, action,
-   "immediate value or a pointer to it"
+   "mark, immediate value or a pointer to it"
" cannot be used as a destination");
if (action_modify_field->dst.field == RTE_FLOW_FIELD_START ||
action_modify_field->src.field == RTE_FLOW_FIELD_START)
-- 
2.18.2



Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Bruce Richardson
On Wed, Jun 16, 2021 at 04:48:59PM +, Honnappa Nagarahalli wrote:
> 
> 
> > 
> > On Wed, Jun 16, 2021 at 02:14:54PM +0200, David Marchand wrote:
> > > On Tue, Jun 15, 2021 at 3:25 PM Chengwen Feng
> >  wrote:
> > > > +
> > > > +#define RTE_DMADEV_NAME_MAX_LEN(64)
> > > > +/**< @internal Max length of name of DMA PMD */
> > > > +
> > > > +/** @internal
> > > > + * The data structure associated with each DMA device.
> > > > + */
> > > > +struct rte_dmadev {
> > > > +   /**< Device ID for this instance */
> > > > +   uint16_t dev_id;
> > > > +   /**< Functions exported by PMD */
> > > > +   const struct rte_dmadev_ops *dev_ops;
> > > > +   /**< Device info. supplied during device initialization */
> > > > +   struct rte_device *device;
> > > > +   /**< Driver info. supplied by probing */
> > > > +   const char *driver_name;
> > > > +
> > > > +   /**< Device name */
> > > > +   char name[RTE_DMADEV_NAME_MAX_LEN]; } __rte_cache_aligned;
> > > > +
> > >
> > > I see no queue/channel notion.
> > > How does a rte_dmadev object relate to a physical hw engine?
> > >
> > One queue, one device.
> > When looking to update the ioat driver for 20.11 release when I added the
> > idxd part, I considered adding a queue parameter to the API to look like one
> > device with multiple queues. However, since each queue acts completely
> > independently of each other, there was no benefit to doing so. It's just 
> > easier
> > to have a single id to identify a device queue.
> Does it mean, the queue is multi thread safe? Do we need queues per core to 
> avoid locking?

The design is for each queue to be like the queue on a NIC, not
thread-safe. However, if the hardware supports thread-safe queues too, that
can be supported. But the API should be like other data-plane ones and be
lock free.

For the DMA devices that I am working on, the number of queues
is not very large, and in most cases each queue appears as a separate
entity, e.g. for ioat each queue/channel appears as a separate PCI ID, and
when using idxd kernel driver each queue is a separate dev node to mmap.
For other cases right now we just create one rawdev instance per queue in
software.

/Bruce


Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Bruce Richardson
On Wed, Jun 16, 2021 at 11:38:08PM +0530, Jerin Jacob wrote:
> On Wed, Jun 16, 2021 at 11:01 PM Bruce Richardson
>  wrote:
> >
> > On Wed, Jun 16, 2021 at 05:41:45PM +0800, fengchengwen wrote:
> > > On 2021/6/16 0:38, Bruce Richardson wrote:
> > > > On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> > > >> This patch introduces 'dmadevice' which is a generic type of DMA
> > > >> device.
> > > >>
> > > >> The APIs of dmadev library exposes some generic operations which can
> > > >> enable configuration and I/O with the DMA devices.
> > > >>
> > > >> Signed-off-by: Chengwen Feng 
> > > >> ---
> > > > Thanks for sending this.
> > > >
> > > > Of most interest to me right now are the key data-plane APIs. While we 
> > > > are
> > > > still in the prototyping phase, below is a draft of what we are thinking
> > > > for the key enqueue/perform_ops/completed_ops APIs.
> > > >
> > > > Some key differences I note in below vs your original RFC:
> > > > * Use of void pointers rather than iova addresses. While using iova's 
> > > > makes
> > > >   sense in the general case when using hardware, in that it can work 
> > > > with
> > > >   both physical addresses and virtual addresses, if we change the APIs 
> > > > to use
> > > >   void pointers instead it will still work for DPDK in VA mode, while 
> > > > at the
> > > >   same time allow use of software fallbacks in error cases, and also a 
> > > > stub
> > > >   driver than uses memcpy in the background. Finally, using iova's 
> > > > makes the
> > > >   APIs a lot more awkward to use with anything but mbufs or similar 
> > > > buffers
> > > >   where we already have a pre-computed physical address.
> > >
> > > The iova is an hint to application, and widely used in DPDK.
> > > If switch to void, how to pass the address (iova or just va ?)
> > > this may introduce implementation dependencies here.
> > >
> > > Or always pass the va, and the driver performs address translation, and 
> > > this
> > > translation may cost too much cpu I think.
> > >
> >
> > On the latter point, about driver doing address translation I would agree.
> > However, we probably need more discussion about the use of iova vs just
> > virtual addresses. My thinking on this is that if we specify the API using
> > iovas it will severely hurt usability of the API, since it forces the user
> > to take more inefficient codepaths in a large number of cases. Given a
> > pointer to the middle of an mbuf, one cannot just pass that straight as an
> > iova but must instead do a translation into offset from mbuf pointer and
> > then readd the offset to the mbuf base address.
> >
> > My preference therefore is to require the use of an IOMMU when using a
> > dmadev, so that it can be a much closer analog of memcpy. Once an iommu is
> > present, DPDK will run in VA mode, allowing virtual addresses to our
> > hugepage memory to be sent directly to hardware. Also, when using
> > dmadevs on top of an in-kernel driver, that kernel driver may do all iommu
> > management for the app, removing further the restrictions on what memory
> > can be addressed by hardware.
> 
> 
> One issue of keeping void * is that memory can come from stack or heap .
> which HW can not really operate it on.

when kernel driver is managing the IOMMU all process memory can be worked
on, not just hugepage memory, so using iova is wrong in these cases.

As I previously said, using iova prevents the creation of a pure software
dummy driver too using memcpy in the background.

/Bruce


Re: [dpdk-dev] [PATCH 13/20] crypto/cnxk: add flexi crypto cipher encrypt

2021-06-16 Thread Akhil Goyal
> 
> diff --git a/doc/guides/cryptodevs/features/cn10k.ini
> b/doc/guides/cryptodevs/features/cn10k.ini
> index 175fbf7..f097d8e 100644
> --- a/doc/guides/cryptodevs/features/cn10k.ini
> +++ b/doc/guides/cryptodevs/features/cn10k.ini
> @@ -7,6 +7,10 @@
>  Symmetric crypto   = Y
>  Sym operation chaining = Y
>  HW Accelerated = Y
> +In Place SGL   = Y
> +OOP SGL In LB  Out = Y
> +OOP SGL In SGL Out = Y
> +OOP LB  In LB  Out = Y
>  Symmetric sessionless  = Y
>  Digest encrypted   = Y
> 
> @@ -14,6 +18,18 @@ Digest encrypted   = Y
>  ; Supported crypto algorithms of 'cn10k' crypto driver.
>  ;
>  [Cipher]
> +NULL   = Y
> +3DES CBC   = Y
> +3DES ECB   = Y
> +AES CBC (128)  = Y
> +AES CBC (192)  = Y
> +AES CBC (256)  = Y
> +AES CTR (128)  = Y
> +AES CTR (192)  = Y
> +AES CTR (256)  = Y
> +AES XTS (128)  = Y
> +AES XTS (256)  = Y
> +DES CBC= Y
> 

It would be better to add all the algos in the .ini file along with 
capabilities patch
After flexi crypto cipher decrypt(14/20)

>  ;
>  ; Supported authentication algorithms of 'cn10k' crypto driver.
> diff --git a/doc/guides/cryptodevs/features/cn9k.ini
> b/doc/guides/cryptodevs/features/cn9k.ini
> index c22b25c..7007d11 100644
> --- a/doc/guides/cryptodevs/features/cn9k.ini
> +++ b/doc/guides/cryptodevs/features/cn9k.ini
> @@ -7,6 +7,10 @@
>  Symmetric crypto   = Y
>  Sym operation chaining = Y
>  HW Accelerated = Y
> +In Place SGL   = Y
> +OOP SGL In LB  Out = Y
> +OOP SGL In SGL Out = Y
> +OOP LB  In LB  Out = Y
>  Symmetric sessionless  = Y
>  Digest encrypted   = Y
> 
> @@ -14,6 +18,18 @@ Digest encrypted   = Y
>  ; Supported crypto algorithms of 'cn9k' crypto driver.
>  ;
>  [Cipher]
> +NULL   = Y
> +3DES CBC   = Y
> +3DES ECB   = Y
> +AES CBC (128)  = Y
> +AES CBC (192)  = Y
> +AES CBC (256)  = Y
> +AES CTR (128)  = Y
> +AES CTR (192)  = Y
> +AES CTR (256)  = Y
> +AES XTS (128)  = Y
> +AES XTS (256)  = Y
> +DES CBC= Y
> 
>  ;
>  ; Supported authentication algorithms of 'cn9k' crypto driver.
> @@ -24,3 +40,7 @@ Digest encrypted   = Y
>  ; Supported AEAD algorithms of 'cn9k' crypto driver.
>  ;
>  [AEAD]
> +AES GCM (128) = Y
> +AES GCM (192) = Y
> +AES GCM (256) = Y
> +CHACHA20-POLY1305 = Y

AEAD is added in 9k but not in 10k in this patch.
Better to have all algos added in .ini along with capabilities after the
Flexi decrypt patch

ZUC/SNOW/KASUMI update in .ini file can be added in later patches
As it is done in current set.





Re: [dpdk-dev] [PATCH 15/20] crypto/cnxk: add ZUC and SNOW3G encrypt

2021-06-16 Thread Akhil Goyal
> diff --git a/doc/guides/cryptodevs/features/cn10k.ini
> b/doc/guides/cryptodevs/features/cn10k.ini
> index f097d8e..8f20d07 100644
> --- a/doc/guides/cryptodevs/features/cn10k.ini
> +++ b/doc/guides/cryptodevs/features/cn10k.ini
> @@ -30,6 +30,8 @@ AES CTR (256)  = Y
>  AES XTS (128)  = Y
>  AES XTS (256)  = Y
>  DES CBC= Y
> +SNOW3G UEA2= Y
> +ZUC EEA3   = Y
> 
ZUC and SNOW3G are added in documentation but decryption
Is added in next patch. It will be better to squash encrypt+ decrypt
Patch or update .ini file in decrypt patch when functionality is complete.




Re: [dpdk-dev] [PATCH 17/20] crypto/cnxk: add KASUMI encrypt

2021-06-16 Thread Akhil Goyal
> diff --git a/doc/guides/cryptodevs/features/cn10k.ini
> b/doc/guides/cryptodevs/features/cn10k.ini
> index 8f20d07..23ec100 100644
> --- a/doc/guides/cryptodevs/features/cn10k.ini
> +++ b/doc/guides/cryptodevs/features/cn10k.ini
> @@ -30,6 +30,7 @@ AES CTR (256)  = Y
>  AES XTS (128)  = Y
>  AES XTS (256)  = Y
>  DES CBC= Y
> +KASUMI F8  = Y
>  SNOW3G UEA2= Y
>  ZUC EEA3   = Y
> 
Same comment as in 15/20 patch.




Re: [dpdk-dev] [PATCH 00/20] Add Marvell CNXK crypto PMDs

2021-06-16 Thread Akhil Goyal
> 
> Add cnxk crypto PMDs supporting Marvell CN106XX SoC, based on
> 'common/cnxk'.
> 
> This series utilizes 'common/cnxk' to register cn9k & cn10k crypto PMDs and
> add symmetric cryptographic features for the same.
> 
> Depends-on: series-17212 ("Add CPT in Marvell CNXK common driver")
> 
Release notes and documentation of the PMD is missing.


[dpdk-dev] [PATCH] lib/flow_classify: fix leaking rules on delete

2021-06-16 Thread ohilyard
From: Owen Hilyard 

Rules in a classify table were not freed if the table
had a delete function.

Fixes: be41ac2a3 ("flow_classify: introduce flow classify library")

Signed-off-by: Owen Hilyard 
---
 lib/flow_classify/rte_flow_classify.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/flow_classify/rte_flow_classify.c 
b/lib/flow_classify/rte_flow_classify.c
index f125267e8..06aed3b70 100644
--- a/lib/flow_classify/rte_flow_classify.c
+++ b/lib/flow_classify/rte_flow_classify.c
@@ -579,7 +579,7 @@ rte_flow_classify_table_entry_delete(struct 
rte_flow_classifier *cls,
&rule->u.key.key_del,
&rule->key_found,
&rule->entry);
-
+   free(rule);
return ret;
}
}
-- 
2.30.2



Re: [dpdk-dev] [PATCH 01/20] crypto/cnxk: add driver skeleton

2021-06-16 Thread Akhil Goyal
> From: Ankur Dwivedi 
> 
> Add driver skeleton for crypto_cn9k & crypto_cn10k PMDs leveraging cnxk
> common framework.
> 
> Signed-off-by: Ankur Dwivedi 
> Signed-off-by: Anoob Joseph 
> Signed-off-by: Archana Muniganti 
> Signed-off-by: Tejasree Kondoj 
> ---
>  MAINTAINERS  |  9 +++
>  doc/guides/cryptodevs/features/cn10k.ini | 21 
>  doc/guides/cryptodevs/features/cn9k.ini  | 21 
>  drivers/crypto/cnxk/cn10k_cryptodev.c| 42
> 
>  drivers/crypto/cnxk/cn10k_cryptodev.h| 13 ++
>  drivers/crypto/cnxk/cn9k_cryptodev.c | 40
> ++
>  drivers/crypto/cnxk/cn9k_cryptodev.h | 13 ++
>  drivers/crypto/cnxk/meson.build  | 16 
>  drivers/crypto/cnxk/version.map  |  3 +++
>  drivers/crypto/meson.build   |  1 +
>  10 files changed, 179 insertions(+)
>  create mode 100644 doc/guides/cryptodevs/features/cn10k.ini
>  create mode 100644 doc/guides/cryptodevs/features/cn9k.ini
>  create mode 100644 drivers/crypto/cnxk/cn10k_cryptodev.c
>  create mode 100644 drivers/crypto/cnxk/cn10k_cryptodev.h
>  create mode 100644 drivers/crypto/cnxk/cn9k_cryptodev.c
>  create mode 100644 drivers/crypto/cnxk/cn9k_cryptodev.h
>  create mode 100644 drivers/crypto/cnxk/meson.build
>  create mode 100644 drivers/crypto/cnxk/version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5877a16..ecfd1a4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1080,6 +1080,15 @@ F: drivers/crypto/octeontx2/
>  F: doc/guides/cryptodevs/octeontx2.rst
>  F: doc/guides/cryptodevs/features/octeontx2.ini
> 
> +Marvell cnxk
> +M: Ankur Dwivedi 
> +M: Anoob Joseph 
> +M: Tejasree Kondoj 
> +F: drivers/crypto/cnxk/
> +F: doc/guides/cryptodevs/cnxk.rst

File added in the MAINTAINERS but is not part of patch.

> +F: doc/guides/cryptodevs/features/cn9k.ini
> +F: doc/guides/cryptodevs/features/cn10k.ini
> +



Re: [dpdk-dev] [PATCH 3/4] crypto/cnxk: add security session ops

2021-06-16 Thread Akhil Goyal
> diff --git a/drivers/crypto/cnxk/meson.build
> b/drivers/crypto/cnxk/meson.build
> index ab45483..eea08fa 100644
> --- a/drivers/crypto/cnxk/meson.build
> +++ b/drivers/crypto/cnxk/meson.build
> @@ -13,6 +13,7 @@ sources = files(
>  'cn9k_cryptodev_ops.c',
>  'cn10k_cryptodev.c',
>  'cn10k_cryptodev_ops.c',
> +'cn10k_ipsec.c',
>  'cnxk_cpt_ops_helper.c',
>  'cnxk_cryptodev.c',
>  'cnxk_cryptodev_capabilities.c',
> @@ -20,4 +21,4 @@ sources = files(
>  'cnxk_cryptodev_sec.c',
>  )
> 
> -deps += ['bus_pci', 'common_cnxk', 'security']
> +deps += ['bus_pci', 'common_cnxk', 'security', 'rte_net']
> --
This should be 'net' and not 'rte_net'.
Do we really need this dependency?



Re: [dpdk-dev] [PATCH 1/4] crypto/cnxk: add security ctx skeleton

2021-06-16 Thread Akhil Goyal
> From: Srujana Challa 
> 
> Add security ctx in cn10k crypto PMD.
> 
> Signed-off-by: Anoob Joseph 
> Signed-off-by: Srujana Challa 
> Signed-off-by: Tejasree Kondoj 
> ---
>  drivers/crypto/cnxk/cn10k_cryptodev.c| 10 +++
>  drivers/crypto/cnxk/cnxk_cryptodev_sec.c | 47
> 
>  drivers/crypto/cnxk/cnxk_cryptodev_sec.h | 14 ++
>  drivers/crypto/cnxk/meson.build  |  3 +-
>  4 files changed, 73 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/crypto/cnxk/cnxk_cryptodev_sec.c
>  create mode 100644 drivers/crypto/cnxk/cnxk_cryptodev_sec.h
> 
> diff --git a/drivers/crypto/cnxk/cn10k_cryptodev.c
> b/drivers/crypto/cnxk/cn10k_cryptodev.c
> index ca3adea..b58d390 100644
> --- a/drivers/crypto/cnxk/cn10k_cryptodev.c
> +++ b/drivers/crypto/cnxk/cn10k_cryptodev.c
> @@ -14,6 +14,7 @@
>  #include "cn10k_cryptodev_ops.h"
>  #include "cnxk_cryptodev.h"
>  #include "cnxk_cryptodev_capabilities.h"
> +#include "cnxk_cryptodev_sec.h"
> 
>  #include "roc_api.h"
> 
> @@ -75,6 +76,11 @@ cn10k_cpt_pci_probe(struct rte_pci_driver *pci_drv
> __rte_unused,
>   plt_err("Failed to add engine group rc=%d", rc);
>   goto dev_fini;
>   }
> +
> + /* Create security context */
> + rc = cnxk_crypto_sec_ctx_create(dev);
> + if (rc)
> + goto dev_fini;
>   }
> 
>   cnxk_cpt_caps_populate(vf);
> @@ -87,6 +93,7 @@ cn10k_cpt_pci_probe(struct rte_pci_driver *pci_drv
> __rte_unused,
>RTE_CRYPTODEV_FF_OOP_SGL_IN_LB_OUT |
>RTE_CRYPTODEV_FF_OOP_SGL_IN_SGL_OUT |
>RTE_CRYPTODEV_FF_SYM_SESSIONLESS |
> +  RTE_CRYPTODEV_FF_SECURITY |
>RTE_CRYPTODEV_FF_DIGEST_ENCRYPTED;

Corresponding change in .ini file missing. Moreover, you should add it in
Last patch of this series when your feature is complete.
Both feature flag and documentation in .ini should be in same patch.


Re: [dpdk-dev] [PATCH 0/4] Add rte_security in crypto_cn10k PMD

2021-06-16 Thread Akhil Goyal
> Add rte_security (lookaside protocol - IPsec) support in crypto_cn10k.
> 
> IPsec operations can be offloaded to CPT's SE and IE engines, which
> can process IPsec protcol operations including atomic sequence number
> increment (for outbound operations) and anti replay window check (for
> inbound operations).
> 
> Depends-on: series-17212 ("Add CPT in Marvell CNXK common driver")
> Depends-on: series-17213 ("Add Marvell CNXK crypto PMDs")
> 
Do you need any update in the documentation of the PMD for this patchset.
Please also update release notes appropriately.


Re: [dpdk-dev] [PATCH 1/3] crypto/cnxk: add asymmetric session ops

2021-06-16 Thread Akhil Goyal
> From: Kiran Kumar K 
> 
> Adding asymmetric crypto session ops.
> 
> Signed-off-by: Kiran Kumar K 
> ---
>  drivers/crypto/cnxk/cn10k_cryptodev.c |   2 +
>  drivers/crypto/cnxk/cn10k_cryptodev_ops.c |   6 +-
>  drivers/crypto/cnxk/cn9k_cryptodev.c  |   4 +-
>  drivers/crypto/cnxk/cn9k_cryptodev_ops.c  |   6 +-
>  drivers/crypto/cnxk/cnxk_ae.h | 210
> ++
>  drivers/crypto/cnxk/cnxk_cpt_ops_helper.c |  14 ++
>  drivers/crypto/cnxk/cnxk_cpt_ops_helper.h |   4 +
>  drivers/crypto/cnxk/cnxk_cryptodev.h  |   3 +-
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  75 +++
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.h  |   8 ++
>  10 files changed, 324 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/crypto/cnxk/cnxk_ae.h
> 
> diff --git a/drivers/crypto/cnxk/cn10k_cryptodev.c
> b/drivers/crypto/cnxk/cn10k_cryptodev.c
> index 9517e62..84a1a3a 100644
> --- a/drivers/crypto/cnxk/cn10k_cryptodev.c
> +++ b/drivers/crypto/cnxk/cn10k_cryptodev.c
> @@ -87,7 +87,9 @@ cn10k_cpt_pci_probe(struct rte_pci_driver *pci_drv
> __rte_unused,
>   cnxk_cpt_caps_populate(vf);
> 
>   dev->feature_flags = RTE_CRYPTODEV_FF_SYMMETRIC_CRYPTO |
> +  RTE_CRYPTODEV_FF_ASYMMETRIC_CRYPTO |
>RTE_CRYPTODEV_FF_HW_ACCELERATED |
> +  RTE_CRYPTODEV_FF_RSA_PRIV_OP_KEY_QT |
>RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING
> |
>RTE_CRYPTODEV_FF_IN_PLACE_SGL |
>RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT |

Same comment as in the ipsec series.

Documentation update
Move above change and .ini file update in patch 2/3
Release notes update.




Re: [dpdk-dev] [PATCH 3/3] app/test: adding cnxk asymmetric autotest

2021-06-16 Thread Akhil Goyal
> Subject: [PATCH 3/3] app/test: adding cnxk asymmetric autotest
> 
Title should be 
test/crypto: add cnxk for asymmetric cases

> From: Kiran Kumar K 
> 
> Adding autotest for cn9k and cn10k.
> 
> Signed-off-by: Kiran Kumar K 



[dpdk-dev] FreeBSD 13 Memory / Contigmem Issues With Booting

2021-06-16 Thread Brandon Lo
Hi everyone,

I have been trying to expand the UNH IOL unit testing coverage over to
FreeBSD 13.
Following this documentation:
https://doc.dpdk.org/guides/freebsd_gsg/build_dpdk.html#loading-the-dpdk-contigmem-module

I compiled the contigmem kernel module from the latest DPDK v21.05 and
set the variables in /boot/loader.conf.
This results in a kernel trap 12 on boot with a machine that has 4GB
of RAM available and 1 GB assigned for contigmem.

If I only set the loader.conf to load the contigmem kernel module
without setting any variables, then it boots fine.
However, using these default settings, DPDK unit tests (fast-suite)
fail in some of the test cases.

Is this a memory-related issue where the virtual machine does not have
enough memory allocated (if so, how much RAM is needed?),
or is it related to compatibility issues between the different FreeBSD versions?

Thanks,
Brandon

-- 
Brandon Lo

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

b...@iol.unh.edu

www.iol.unh.edu


Re: [dpdk-dev] [PATCH v1] net/i40e: remove the SMP barrier in HW scanning func

2021-06-16 Thread Honnappa Nagarahalli


> > > > >
> > > > > > >
> > > > > > > Add the logic to determine how many DD bits have been set
> > > > > > > for contiguous packets, for removing the SMP barrier while reading
> descs.
> > > > > >
> > > > > > I didn't understand this.
> > > > > > The current logic already guarantee the read out DD bits are
> > > > > > from continue packets, as it read Rx descriptor in a reversed
> > > > > > order from the
> > > > ring.
> > > > > Qi, the comments in the code mention that there is a race
> > > > > condition if the descriptors are not read in the reverse order.
> > > > > But, they do not mention what the race condition is and how it can
> occur.
> > > > > Appreciate if you could explain that.
> > > >
> > > > The Race condition happens between the NIC and CPU, if write and
> > > > read DD bit in the same order, there might be a hole (e.g. 1011)
> > > > with the reverse read order, we make sure no more "1" after the first 
> > > > "0"
> > > > as the read address are declared as volatile, compiler will not
> > > > re-ordered them.
> > > My understanding is that
> > >
> > > 1) the NIC will write an entire cache line of descriptors to memory
> "atomically"
> > > (i.e. the entire cache line is visible to the CPU at once) if there
> > > are enough descriptors ready to fill one cache line.
> > > 2) But, if there are not enough descriptors ready (because for ex:
> > > there is not enough traffic), then it might write partial cache lines.
> >
> > Yes, for example a cache line contains 4 x16 bytes descriptors and it is
> possible we get 1 1 1 0 for DD bit at some moment.
> >
> > >
> > > Please correct me if I am wrong.
> > >
> > > For #1, I do not think it matters if we read the descriptors in
> > > reverse order or not as the cache line is written atomically.
> >
> > I think below cases may happens if we don't read in reserve order.
> >
> > 1. CPU get first cache line as 1 1 1 0 in a loop 2. new packets coming
> > and NIC append last 1 to the first cache and a new cache line with 1 1 1 1.
> > 3. CPU continue new cache line with 1 1 1 1 in the same loop, but the last 1
> of first cache line is missed, so finally it get 1 1 1 0 1 1 1 1.
> >
> 
> The one-sentence answer here is: when two entities are moving along a line in
> the same direction - like two runners in a race - then they can pass each 
> other
> multiple times as each goes slower or faster at any point in time, whereas if
> they are moving in opposite directions there will only ever be one cross-over
> point no matter how the speed of each changes.
> 
> In the case of NIC and software this fact means that there will always be a
> clear cross-over point from DD set to not-set.
Thanks Bruce, that is a great analogy to describe the problem assuming that the 
reads are actually happening in the program order.

On Arm platform, even though the program is reading in reverse order, the reads 
might get executed in any random order. We have 2 solutions here:
1) Enforced the order with barriers or
2) Only process descriptors with contiguous DD bits set

> 
> >
> > > For #1, if we read in reverse order, does it make sense to not check
> > > the DD bits of descriptors that are earlier in the order once we
> > > encounter a descriptor that has its DD bit set? This is because NIC 
> > > updates
> the descriptors in order.
> >
> > I think the answer is yes, when we met the first DD bit, we should able to
> calculated the exact number base on the index, but not sure how much
> performance gain.
> >
> The other factors here are:
> 1. The driver does not do a straight read of all 32 DD bits in one go, rather 
> it
> does 8 at a time and aborts at the end of a set of 8 if not all are valid.
> 2. For any that are set, we have to read the descriptor anyway to get the
> packet data out of it, so in the shortcut case of the last descriptor being 
> set,
> we still have to read the other 7 anyway, and DD comes for free as part of it.
> 3. Blindly reading 8 at a time reduces the branching to just a single decision
> point at the end of each set of 8, reducing possible branch mispredicts.
Agree.
I think there is another requirement. The other words in the descriptor should 
be read only after reading the word containing the DD bit.

On x86, the program order takes care of this (although compiler barrier is 
required).
On Arm, this needs to be taken care explicitly using barriers.


Re: [dpdk-dev] [EXT] [PATCH v4 0/8] baseband: add NXP LA12xx driver

2021-06-16 Thread Akhil Goyal
> > Subject: [EXT] [PATCH v4 0/8] baseband: add NXP LA12xx driver
> >
> > This series introduces the BBDEV LA12xx poll mode driver (PMD) to support
> > an implementation for offloading High Phy processing functions like
> > LDPC Encode / Decode 5GNR wireless acceleration function, using PCI
> based
> > LA12xx Software defined radio.
> >
> > Please check the documentation patch for more info.
> >
> > The driver currently implements basic feature to offload only the 5G LDPC
> > encode/decode.
> >
> > A new capability has been added to check if the driver can support the
> > input data in network byte order. Two test vectors are also added as an
> > example with input data in network byte.
> >
> > v2: add test case changes
> > v3: fix 32 bit compilation
> > v4: capability for network byte order, doc patch merged inline.
> >
> > Hemant Agrawal (7):
> >   bbdev: add network order data capability
> >   baseband: introduce NXP LA12xx driver
> >   baseband/la12xx: add devargs for max queues
> >   baseband/la12xx: add support for multiple modems
> >   baseband/la12xx: add queue and modem config support
> >   baseband/la12xx: add enqueue and dequeue support
> >   app/bbdev: enable la12xx for bbdev
> >
> > Nipun Gupta (1):
> >   app/bbdev: add test vectors for transport blocks
> >
> This PMD is deferred for next release. Marked as deferred in patchworks.
Hi Hemant,

Any update on this PMD? Is it still planned for 21.08?

Regards,
Akhil


[dpdk-dev] [PATCH 1/4] net/bnxt: fix ring and context memory allocation

2021-06-16 Thread Lance Richardson
Use requested socket ID when allocating memory for transmit rings,
receive rings, and completion queues. Use device NUMA ID when
allocating context memory, notification queue rings, async
completion queue rings, and VNIC attributes.

Fixes: 6eb3cc2294fd ("net/bnxt: add initial Tx code")
Fixes: 9738793f28ec ("net/bnxt: add VNIC functions and structs")
Fixes: f8168ca0e690 ("net/bnxt: support thor controller")
Fixes: bd0a14c99f65 ("net/bnxt: use dedicated CPR for async events")
Fixes: 683e5cf79249 ("net/bnxt: use common NQ ring")
Cc: sta...@dpdk.org
Signed-off-by: Lance Richardson 
Reviewed-by: Somnath Kotur 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_ethdev.c | 17 +
 drivers/net/bnxt/bnxt_ring.c   | 30 ++
 drivers/net/bnxt/bnxt_ring.h   |  2 +-
 drivers/net/bnxt/bnxt_rxq.c|  4 ++--
 drivers/net/bnxt/bnxt_txq.c|  4 ++--
 drivers/net/bnxt/bnxt_vnic.c   |  3 ++-
 6 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index d859ef503..d4b8762d5 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -580,13 +580,14 @@ static int bnxt_register_fc_ctx_mem(struct bnxt *bp)
return rc;
 }
 
-static int bnxt_alloc_ctx_mem_buf(char *type, size_t size,
+static int bnxt_alloc_ctx_mem_buf(struct bnxt *bp, char *type, size_t size,
  struct bnxt_ctx_mem_buf_info *ctx)
 {
if (!ctx)
return -EINVAL;
 
-   ctx->va = rte_zmalloc(type, size, 0);
+   ctx->va = rte_zmalloc_socket(type, size, 0,
+bp->eth_dev->device->numa_node);
if (ctx->va == NULL)
return -ENOMEM;
rte_mem_lock_page(ctx->va);
@@ -610,7 +611,7 @@ static int bnxt_init_fc_ctx_mem(struct bnxt *bp)
sprintf(type, "bnxt_rx_fc_in_" PCI_PRI_FMT, pdev->addr.domain,
pdev->addr.bus, pdev->addr.devid, pdev->addr.function);
/* 4 bytes for each counter-id */
-   rc = bnxt_alloc_ctx_mem_buf(type,
+   rc = bnxt_alloc_ctx_mem_buf(bp, type,
max_fc * 4,
&bp->flow_stat->rx_fc_in_tbl);
if (rc)
@@ -619,7 +620,7 @@ static int bnxt_init_fc_ctx_mem(struct bnxt *bp)
sprintf(type, "bnxt_rx_fc_out_" PCI_PRI_FMT, pdev->addr.domain,
pdev->addr.bus, pdev->addr.devid, pdev->addr.function);
/* 16 bytes for each counter - 8 bytes pkt_count, 8 bytes byte_count */
-   rc = bnxt_alloc_ctx_mem_buf(type,
+   rc = bnxt_alloc_ctx_mem_buf(bp, type,
max_fc * 16,
&bp->flow_stat->rx_fc_out_tbl);
if (rc)
@@ -628,7 +629,7 @@ static int bnxt_init_fc_ctx_mem(struct bnxt *bp)
sprintf(type, "bnxt_tx_fc_in_" PCI_PRI_FMT, pdev->addr.domain,
pdev->addr.bus, pdev->addr.devid, pdev->addr.function);
/* 4 bytes for each counter-id */
-   rc = bnxt_alloc_ctx_mem_buf(type,
+   rc = bnxt_alloc_ctx_mem_buf(bp, type,
max_fc * 4,
&bp->flow_stat->tx_fc_in_tbl);
if (rc)
@@ -637,7 +638,7 @@ static int bnxt_init_fc_ctx_mem(struct bnxt *bp)
sprintf(type, "bnxt_tx_fc_out_" PCI_PRI_FMT, pdev->addr.domain,
pdev->addr.bus, pdev->addr.devid, pdev->addr.function);
/* 16 bytes for each counter - 8 bytes pkt_count, 8 bytes byte_count */
-   rc = bnxt_alloc_ctx_mem_buf(type,
+   rc = bnxt_alloc_ctx_mem_buf(bp, type,
max_fc * 16,
&bp->flow_stat->tx_fc_out_tbl);
if (rc)
@@ -4518,7 +4519,7 @@ static int bnxt_alloc_ctx_mem_blk(struct bnxt *bp,
if (!mz) {
mz = rte_memzone_reserve_aligned(mz_name,
rmem->nr_pages * 8,
-   SOCKET_ID_ANY,
+   bp->eth_dev->device->numa_node,
RTE_MEMZONE_2MB |
RTE_MEMZONE_SIZE_HINT_ONLY |
RTE_MEMZONE_IOVA_CONTIG,
@@ -4541,7 +4542,7 @@ static int bnxt_alloc_ctx_mem_blk(struct bnxt *bp,
if (!mz) {
mz = rte_memzone_reserve_aligned(mz_name,
 mem_size,
-SOCKET_ID_ANY,
+bp->eth_dev->device->numa_node,
 RTE_MEMZONE_1GB |
 RTE_MEMZONE_SIZE_HINT_ONLY |
 RTE_MEMZONE_IOVA_CONTIG,
diff --git a/drivers/net/bnxt/bnxt_ring.c b/drivers/net/bnxt/b

[dpdk-dev] [PATCH 0/4] net/bnxt: various fixes

2021-06-16 Thread Lance Richardson
Several fixes for the bnxt PMD:
   - Fix NUMA-aware memory allocations.
   - Fix transmit descriptor status implementation.
   - Fix handling of transmit completions in non-vector path.
   - Remove dead code.

Lance Richardson (4):
  net/bnxt: fix ring and context memory allocation
  net/bnxt: fix tx desc status implementation
  net/bnxt: fix scalar Tx completion handling
  net/bnxt: remove dead code

 drivers/net/bnxt/bnxt_cpr.h   | 17 ---
 drivers/net/bnxt/bnxt_ethdev.c| 71 +++
 drivers/net/bnxt/bnxt_hwrm.c  |  2 -
 drivers/net/bnxt/bnxt_ring.c  | 31 ++--
 drivers/net/bnxt/bnxt_ring.h  |  2 +-
 drivers/net/bnxt/bnxt_rxq.c   |  4 +-
 drivers/net/bnxt/bnxt_rxr.c   | 11 -
 drivers/net/bnxt/bnxt_rxtx_vec_neon.c |  3 --
 drivers/net/bnxt/bnxt_rxtx_vec_sse.c  |  3 --
 drivers/net/bnxt/bnxt_txq.c   |  4 +-
 drivers/net/bnxt/bnxt_txr.c   | 22 -
 drivers/net/bnxt/bnxt_vnic.c  |  3 +-
 12 files changed, 70 insertions(+), 103 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH 3/4] net/bnxt: fix scalar Tx completion handling

2021-06-16 Thread Lance Richardson
Preserve the raw (unmasked) transmit completion ring
consumer index.

Remove cache prefetches that have no measurable performance
benefit.

Fixes: c7de4195cc4c ("net/bnxt: modify ring index logic")
Cc: sta...@dpdk.org
Signed-off-by: Lance Richardson 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_txr.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_txr.c b/drivers/net/bnxt/bnxt_txr.c
index 27459960d..54eaab34a 100644
--- a/drivers/net/bnxt/bnxt_txr.c
+++ b/drivers/net/bnxt/bnxt_txr.c
@@ -444,30 +444,26 @@ static void bnxt_tx_cmp(struct bnxt_tx_queue *txq, int 
nr_pkts)
 
 static int bnxt_handle_tx_cp(struct bnxt_tx_queue *txq)
 {
+   uint32_t nb_tx_pkts = 0, cons, ring_mask, opaque;
struct bnxt_cp_ring_info *cpr = txq->cp_ring;
uint32_t raw_cons = cpr->cp_raw_cons;
-   uint32_t cons;
-   uint32_t nb_tx_pkts = 0;
+   struct bnxt_ring *cp_ring_struct;
struct tx_cmpl *txcmp;
-   struct cmpl_base *cp_desc_ring = cpr->cp_desc_ring;
-   struct bnxt_ring *cp_ring_struct = cpr->cp_ring_struct;
-   uint32_t ring_mask = cp_ring_struct->ring_mask;
-   uint32_t opaque = 0;
 
if (bnxt_tx_bds_in_hw(txq) < txq->tx_free_thresh)
return 0;
 
+   cp_ring_struct = cpr->cp_ring_struct;
+   ring_mask = cp_ring_struct->ring_mask;
+
do {
cons = RING_CMPL(ring_mask, raw_cons);
txcmp = (struct tx_cmpl *)&cpr->cp_desc_ring[cons];
-   rte_prefetch_non_temporal(&cp_desc_ring[(cons + 2) &
-   ring_mask]);
 
-   if (!CMPL_VALID(txcmp, cpr->valid))
+   if (!CMP_VALID(txcmp, raw_cons, cp_ring_struct))
break;
-   opaque = rte_cpu_to_le_32(txcmp->opaque);
-   NEXT_CMPL(cpr, cons, cpr->valid, 1);
-   rte_prefetch0(&cp_desc_ring[cons]);
+
+   opaque = rte_le_to_cpu_32(txcmp->opaque);
 
if (CMP_TYPE(txcmp) == TX_CMPL_TYPE_TX_L2)
nb_tx_pkts += opaque;
@@ -475,9 +471,11 @@ static int bnxt_handle_tx_cp(struct bnxt_tx_queue *txq)
RTE_LOG_DP(ERR, PMD,
"Unhandled CMP type %02x\n",
CMP_TYPE(txcmp));
-   raw_cons = cons;
+   raw_cons = NEXT_RAW_CMP(raw_cons);
} while (nb_tx_pkts < ring_mask);
 
+   cpr->valid = !!(raw_cons & cp_ring_struct->ring_size);
+
if (nb_tx_pkts) {
if (txq->offloads & DEV_TX_OFFLOAD_MBUF_FAST_FREE)
bnxt_tx_cmp_fast(txq, nb_tx_pkts);
-- 
2.25.1



[dpdk-dev] [PATCH 2/4] net/bnxt: fix tx desc status implementation

2021-06-16 Thread Lance Richardson
With tx completion batching, a single transmit completion
can correspond to one or more transmit descriptors, adjust
implementation to account for this.

RTE_ETH_TX_DESC_DONE should be returned for descriptors that
are available for use instead of RTE_ETH_TX_DESC_UNAVAIL.

Fixes: 5735eb241947 ("net/bnxt: support Tx batching")
Fixes: 478ed3bb7b9d "(net/bnxt: support Tx descriptor status")
Cc: sta...@dpdk.org
Signed-off-by: Lance Richardson 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_ethdev.c | 54 +++---
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index d4b8762d5..a4a0142e8 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3296,41 +3296,49 @@ static int
 bnxt_tx_descriptor_status_op(void *tx_queue, uint16_t offset)
 {
struct bnxt_tx_queue *txq = (struct bnxt_tx_queue *)tx_queue;
-   struct bnxt_tx_ring_info *txr;
-   struct bnxt_cp_ring_info *cpr;
-   struct rte_mbuf **tx_buf;
-   struct tx_pkt_cmpl *txcmp;
-   uint32_t cons, cp_cons;
+   struct bnxt_cp_ring_info *cpr = txq->cp_ring;
+   uint32_t ring_mask, raw_cons, nb_tx_pkts = 0;
+   struct bnxt_ring *cp_ring_struct;
+   struct cmpl_base *cp_desc_ring;
int rc;
 
-   if (!txq)
-   return -EINVAL;
-
rc = is_bnxt_in_error(txq->bp);
if (rc)
return rc;
 
-   cpr = txq->cp_ring;
-   txr = txq->tx_ring;
-
if (offset >= txq->nb_tx_desc)
return -EINVAL;
 
-   cons = RING_CMP(cpr->cp_ring_struct, offset);
-   txcmp = (struct tx_pkt_cmpl *)&cpr->cp_desc_ring[cons];
-   cp_cons = cpr->cp_raw_cons;
+   /* Return "desc done" if descriptor is available for use. */
+   if (bnxt_tx_bds_in_hw(txq) <= offset)
+   return RTE_ETH_TX_DESC_DONE;
 
-   if (cons > cp_cons) {
-   if (CMPL_VALID(txcmp, cpr->valid))
-   return RTE_ETH_TX_DESC_UNAVAIL;
-   } else {
-   if (CMPL_VALID(txcmp, !cpr->valid))
-   return RTE_ETH_TX_DESC_UNAVAIL;
+   raw_cons = cpr->cp_raw_cons;
+   cp_desc_ring = cpr->cp_desc_ring;
+   cp_ring_struct = cpr->cp_ring_struct;
+   ring_mask = cpr->cp_ring_struct->ring_mask;
+
+   /* Check to see if hw has posted a completion for the descriptor. */
+   while (1) {
+   struct tx_cmpl *txcmp;
+   uint32_t cons;
+
+   cons = RING_CMPL(ring_mask, raw_cons);
+   txcmp = (struct tx_cmpl *)&cp_desc_ring[cons];
+
+   if (!CMP_VALID(txcmp, raw_cons, cp_ring_struct))
+   break;
+
+   if (CMP_TYPE(txcmp) == TX_CMPL_TYPE_TX_L2)
+   nb_tx_pkts += rte_le_to_cpu_32(txcmp->opaque);
+
+   if (nb_tx_pkts > offset)
+   return RTE_ETH_TX_DESC_DONE;
+
+   raw_cons = NEXT_RAW_CMP(raw_cons);
}
-   tx_buf = &txr->tx_buf_ring[cons];
-   if (*tx_buf == NULL)
-   return RTE_ETH_TX_DESC_DONE;
 
+   /* Descriptor is pending transmit, not yet completed by hardware. */
return RTE_ETH_TX_DESC_FULL;
 }
 
-- 
2.25.1



[dpdk-dev] [PATCH 4/4] net/bnxt: remove dead code

2021-06-16 Thread Lance Richardson
Code related to maintaining completion ring "valid" state is
no longer needed, remove it.

Signed-off-by: Lance Richardson 
Reviewed-by: Ajit Khaparde 
Reviewed-by: Somnath Kotur 
---
 drivers/net/bnxt/bnxt_cpr.h   | 17 -
 drivers/net/bnxt/bnxt_hwrm.c  |  2 --
 drivers/net/bnxt/bnxt_ring.c  |  1 -
 drivers/net/bnxt/bnxt_rxr.c   | 11 ---
 drivers/net/bnxt/bnxt_rxtx_vec_neon.c |  3 ---
 drivers/net/bnxt/bnxt_rxtx_vec_sse.c  |  3 ---
 drivers/net/bnxt/bnxt_txr.c   |  2 --
 7 files changed, 39 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_cpr.h b/drivers/net/bnxt/bnxt_cpr.h
index 28c0a9049..2a56ec52c 100644
--- a/drivers/net/bnxt/bnxt_cpr.h
+++ b/drivers/net/bnxt/bnxt_cpr.h
@@ -15,14 +15,6 @@ struct bnxt_db_info;
(!!(rte_le_to_cpu_32(((struct cmpl_base *)(cmp))->info3_v) &\
CMPL_BASE_V) == !((raw_cons) & ((ring)->ring_size)))
 
-#define CMPL_VALID(cmp, v) \
-   (!!(rte_le_to_cpu_32(((struct cmpl_base *)(cmp))->info3_v) &\
-   CMPL_BASE_V) == !(v))
-
-#define NQ_CMP_VALID(nqcmp, raw_cons, ring)\
-   (!!((nqcmp)->v & rte_cpu_to_le_32(NQ_CN_V)) ==  \
-!((raw_cons) & ((ring)->ring_size)))
-
 #define CMP_TYPE(cmp)  \
(((struct cmpl_base *)cmp)->type & CMPL_BASE_TYPE_MASK)
 
@@ -35,18 +27,10 @@ struct bnxt_db_info;
 #define RING_CMP(ring, idx)((idx) & (ring)->ring_mask)
 #define RING_CMPL(ring_mask, idx)  ((idx) & (ring_mask))
 #define NEXT_CMP(idx)  RING_CMP(ADV_RAW_CMP(idx, 1))
-#define FLIP_VALID(cons, mask, val)((cons) >= (mask) ? !(val) : (val))
 
 #define DB_CP_REARM_FLAGS  (DB_KEY_CP | DB_IDX_VALID)
 #define DB_CP_FLAGS(DB_KEY_CP | DB_IDX_VALID | DB_IRQ_DIS)
 
-#define NEXT_CMPL(cpr, idx, v, inc)do { \
-   (idx) += (inc); \
-   if (unlikely((idx) >= (cpr)->cp_ring_struct->ring_size)) { \
-   (v) = !(v); \
-   (idx) = 0; \
-   } \
-} while (0)
 #define B_CP_DB_REARM(cpr, raw_cons)   \
rte_write32((DB_CP_REARM_FLAGS |\
DB_RING_IDX(&((cpr)->cp_db), raw_cons)),\
@@ -107,7 +91,6 @@ struct bnxt_cp_ring_info {
uint32_thw_stats_ctx_id;
 
struct bnxt_ring*cp_ring_struct;
-   boolvalid;
 };
 
 #define RX_CMP_L2_ERRORS   \
diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 6eab2342f..9ec65ad22 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -2670,7 +2670,6 @@ void bnxt_free_nq_ring(struct bnxt *bp, struct 
bnxt_cp_ring_info *cpr)
memset(cpr->cp_desc_ring, 0, cpr->cp_ring_struct->ring_size *
 sizeof(*cpr->cp_desc_ring));
cpr->cp_raw_cons = 0;
-   cpr->valid = 0;
 }
 
 void bnxt_free_cp_ring(struct bnxt *bp, struct bnxt_cp_ring_info *cpr)
@@ -2684,7 +2683,6 @@ void bnxt_free_cp_ring(struct bnxt *bp, struct 
bnxt_cp_ring_info *cpr)
memset(cpr->cp_desc_ring, 0, cpr->cp_ring_struct->ring_size *
sizeof(*cpr->cp_desc_ring));
cpr->cp_raw_cons = 0;
-   cpr->valid = 0;
 }
 
 void bnxt_free_hwrm_rx_ring(struct bnxt *bp, int queue_index)
diff --git a/drivers/net/bnxt/bnxt_ring.c b/drivers/net/bnxt/bnxt_ring.c
index 4a90ac264..cb18dfba7 100644
--- a/drivers/net/bnxt/bnxt_ring.c
+++ b/drivers/net/bnxt/bnxt_ring.c
@@ -769,7 +769,6 @@ int bnxt_alloc_async_cp_ring(struct bnxt *bp)
return rc;
 
cpr->cp_raw_cons = 0;
-   cpr->valid = 0;
bnxt_set_db(bp, &cpr->cp_db, ring_type, 0,
cp_ring->fw_ring_id, cp_ring->ring_mask);
 
diff --git a/drivers/net/bnxt/bnxt_rxr.c b/drivers/net/bnxt/bnxt_rxr.c
index 756a45ba9..0dee73af8 100644
--- a/drivers/net/bnxt/bnxt_rxr.c
+++ b/drivers/net/bnxt/bnxt_rxr.c
@@ -297,9 +297,6 @@ static int bnxt_agg_bufs_valid(struct bnxt_cp_ring_info 
*cpr,
raw_cp_cons = ADV_RAW_CMP(raw_cp_cons, agg_bufs);
last_cp_cons = RING_CMP(cpr->cp_ring_struct, raw_cp_cons);
agg_cmpl = (struct rx_pkt_cmpl *)&cpr->cp_desc_ring[last_cp_cons];
-   cpr->valid = FLIP_VALID(raw_cp_cons,
-   cpr->cp_ring_struct->ring_mask,
-   cpr->valid);
return CMP_VALID(agg_cmpl, raw_cp_cons, cpr->cp_ring_struct);
 }
 
@@ -898,10 +895,6 @@ static int bnxt_rx_pkt(struct rte_mbuf **rx_pkt,
if (!CMP_VALID(rxcmp1, tmp_raw_cons, cpr->cp_ring_struct))
return -EBUSY;
 
-   cpr->valid = FLIP_VALID(cp_cons,
-   cpr->cp_ring_struct->ring_mask,
-   cpr->valid);
-
if (cmp_type == RX_TPA_START_CMPL_TYPE_RX_TPA_START ||
cmp_type == RX_TPA_START_V2_CMPL_TYPE_RX_TP

Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-16 Thread Dmitry Kozlyuk
2021-06-16 18:29 (UTC+0530), Jerin Jacob:
> On Wed, Jun 16, 2021 at 5:52 PM Burakov, Anatoly
>  wrote:
> >
> > On 16-Jun-21 10:42 AM, Jerin Jacob wrote:  
> > > On Tue, Jun 15, 2021 at 12:18 PM Thomas Monjalon  
> > > wrote:  
> > >>
> > >> 14/06/2021 17:48, Morten Brørup:  
> >  From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon  
> > >>> It would be much simpler to just increase RTE_MAX_ETHPORTS to something 
> > >>> big enough to hold a sufficiently large array. And possibly add an 
> > >>> rte_max_ethports variable to indicate the number of populated entries 
> > >>> in the array, for use when iterating over the array.
> > >>>
> > >>> Can we come up with another example than RTE_MAX_ETHPORTS where this 
> > >>> library provides a better benefit?  
> > >>
> > >> What is big enough?
> > >> Is 640KB enough for RAM? ;)  
> > >
> > > If I understand it correctly, Linux process allocates 640KB due to
> > > that fact currently
> > > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] is global and it
> > > is from BSS.
> > >
> > > If we make this from heap i.e use malloc() to allocate this memory
> > > then in my understanding Linux
> > > really won't allocate the real page for backend memory until unless,
> > > someone write/read to this memory.
> > >
> > > i.e it will be free virtual memory using Linux memory management help.
> > > If so, we can keep large values for RTE_MAX_ETHPORTS
> > > without wasting any "real" memory even though the system has a few ports.
> > >
> > > Thoughts?
> > >  
> >
> > mmap works this way with anonymous memory, i'm not so sure about
> > malloc()'ed memory.  
> 
> Looking at online documentation scatters over the internet, sbrk(), is
> based on demand paging.
> So I am not sure as well. I am also not sure how we can write some
> test case to verify it.
> Allocating a huge memory through malloc() not failing, not sure it is
> due to demand pagging
> or Linux over commit feature or combination of both,
> 
> if mmap works in this way, we could have EAL abstraction for such
> memory alloc like
> eal_malloc_demand_page() or so and if Windows also supports it.
> 
> 
> 
> > Plus, we can't base these decisions on what Linux
> > does because we support other OS's. Do they do this as well?  
> 
> + Windows OS maintainers

Yes, Windows uses demand paging.

Is it true that BSS is eagerly allocated (i. e. RAM consumed)? If not, and it
shouldn't be, malloc() isn't needed unless hugepages are required.


Re: [dpdk-dev] [PATCH] net/bnxt: add support to dump SFP module info

2021-06-16 Thread Ajit Khaparde
On Tue, Jun 8, 2021 at 7:23 PM Kalesh A P
 wrote:
>
> From: Kalesh AP 
>
> Add support to fetch the SFP EEPROM settings from the firmware.
> For SFP+ modules we will display 0xA0 page for status and 0xA2 page
> for other information. For QSFP modules we will show the 0xA0 page.
>
> Also identify the module types for QSFP28, QSFP, QSFP+ apart
> from the SFP modules and return an error for 10GBase-T PHY.
>
> Signed-off-by: Kalesh AP 
> Reviewed-by: Somnath Kotur 
> Reviewed-by: Ajit Khaparde 
> Reviewed-by: Venkat Duvvuru 

Patch applied to the for-next-net branch of dpdk-next-net-brcm.

> ---
>  doc/guides/nics/features/bnxt.ini  |   1 +
>  drivers/net/bnxt/bnxt.h|  15 
>  drivers/net/bnxt/bnxt_ethdev.c | 145 
> +
>  drivers/net/bnxt/bnxt_hwrm.c   |  36 
>  drivers/net/bnxt/bnxt_hwrm.h   |   3 +
>  drivers/net/bnxt/hsi_struct_def_dpdk.h |  83 +++
>  6 files changed, 283 insertions(+)
>
> diff --git a/doc/guides/nics/features/bnxt.ini 
> b/doc/guides/nics/features/bnxt.ini
> index 291faaa..b6eaca8 100644
> --- a/doc/guides/nics/features/bnxt.ini
> +++ b/doc/guides/nics/features/bnxt.ini
> @@ -42,6 +42,7 @@ Extended stats   = Y
>  Stats per queue  = Y
>  FW version   = Y
>  EEPROM dump  = Y
> +Module EEPROM dump   = Y
>  LED  = Y
>  Multiprocess aware   = Y
>  FreeBSD  = Y
> diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
> index e93a7eb..8ec8ddc 100644
> --- a/drivers/net/bnxt/bnxt.h
> +++ b/drivers/net/bnxt/bnxt.h
> @@ -292,6 +292,7 @@ struct bnxt_link_info {
> uint16_tauto_pam4_link_speeds;
> uint16_tsupport_pam4_auto_speeds;
> uint8_t req_signal_mode;
> +   uint8_t module_status;
>  };
>
>  #define BNXT_COS_QUEUE_COUNT   8
> @@ -965,6 +966,20 @@ struct bnxt_vf_rep_tx_queue {
> struct bnxt_representor *bp;
>  };
>
> +#define I2C_DEV_ADDR_A00xa0
> +#define I2C_DEV_ADDR_A20xa2
> +#define SFF_DIAG_SUPPORT_OFFSET0x5c
> +#define SFF_MODULE_ID_SFP  0x3
> +#define SFF_MODULE_ID_QSFP 0xc
> +#define SFF_MODULE_ID_QSFP_PLUS0xd
> +#define SFF_MODULE_ID_QSFP28   0x11
> +#define SFF8636_FLATMEM_OFFSET 0x2
> +#define SFF8636_FLATMEM_MASK   0x4
> +#define SFF8636_OPT_PAGES_OFFSET   0xc3
> +#define SFF8636_PAGE1_MASK 0x40
> +#define SFF8636_PAGE2_MASK 0x80
> +#define BNXT_MAX_PHY_I2C_RESP_SIZE 64
> +
>  int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, uint16_t new_mtu);
>  int bnxt_link_update(struct rte_eth_dev *eth_dev, int wait_to_complete,
>  bool exp_link_status);
> diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
> index c9536f7..30aa0ef 100644
> --- a/drivers/net/bnxt/bnxt_ethdev.c
> +++ b/drivers/net/bnxt/bnxt_ethdev.c
> @@ -3851,6 +3851,149 @@ bnxt_set_eeprom_op(struct rte_eth_dev *dev,
>  in_eeprom->data, in_eeprom->length);
>  }
>
> +static int bnxt_get_module_info(struct rte_eth_dev *dev,
> +   struct rte_eth_dev_module_info *modinfo)
> +{
> +   uint8_t module_info[SFF_DIAG_SUPPORT_OFFSET + 1];
> +   struct bnxt *bp = dev->data->dev_private;
> +   int rc;
> +
> +   /* No point in going further if phy status indicates
> +* module is not inserted or if it is powered down or
> +* if it is of type 10GBase-T
> +*/
> +   if (bp->link_info->module_status >
> +   HWRM_PORT_PHY_QCFG_OUTPUT_MODULE_STATUS_WARNINGMSG) {
> +   PMD_DRV_LOG(NOTICE, "Port %u : Module is not inserted or is 
> powered down\n",
> +   dev->data->port_id);
> +   return -ENOTSUP;
> +   }
> +
> +   /* This feature is not supported in older firmware versions */
> +   if (bp->hwrm_spec_code < 0x10202) {
> +   PMD_DRV_LOG(NOTICE, "Port %u : Feature is not supported in 
> older firmware\n",
> +   dev->data->port_id);
> +   return -ENOTSUP;
> +   }
> +
> +   rc = bnxt_hwrm_read_sfp_module_eeprom_info(bp, I2C_DEV_ADDR_A0, 0, 0,
> +  SFF_DIAG_SUPPORT_OFFSET + 
> 1,
> +  module_info);
> +
> +   if (rc)
> +   return rc;
> +
> +   switch (module_info[0]) {
> +   case SFF_MODULE_ID_SFP:
> +   modinfo->type = RTE_ETH_MODULE_SFF_8472;
> +   modinfo->eeprom_len = RTE_ETH_MODULE_SFF_8472_LEN;
> +   if (module_info[SFF_DIAG_SUPPORT_OFFSET] == 0)
> +   modinfo->eeprom_len = RTE_ETH_MODULE_SFF_8436_LEN;
> +   break;
> +   case SFF_MODULE_ID_QSFP:
> +   case SFF_MODULE_ID_QSFP_P

[dpdk-dev] [Bug 741] [dpdk-21.05]vhost relaunch meet Segmentation fault issue when virtio queues larger than vhost queues

2021-06-16 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=741

Bug ID: 741
   Summary: [dpdk-21.05]vhost relaunch meet Segmentation fault
issue when virtio queues larger than vhost queues
   Product: DPDK
   Version: unspecified
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: vhost/virtio
  Assignee: dev@dpdk.org
  Reporter: weix.l...@intel.com
  Target Milestone: ---

Issue discription:
vhost relaunch meet segment fault issue from when virtio queues larger than
vhost queues.

Test Environment:
DPDK version: DPDK v21.05
OS: Linux 5.4
Compiler: gcc 9.3
CPU: 8280m

The reproduce step is :
1. launch two vhost ports :

./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -n 4 -l 2-4 --file-prefix=vhost 
--vdev 'net_vhost0,iface=/tmp/s0,client=1,queues=1' --vdev
'net_vhost1,iface=/tmp/s1,client=1,queues=1' -- -i --nb-cores=1 --txd=1024
--rxd=1024
testpmd>start
2. launch one virtio-user:

./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -n 4 -l 5-6 --no-pci
--file-prefix=virtio0 \
--vdev=net_virtio_user0,mac=00:01:02:03:04:05,path=/tmp/s0,server=1,mrg_rxbuf=0,in_order=0,queues=1
\
-- -i --tx-offloads=0x0 --enable-hw-vlan-strip --nb-cores=1 --txd=1024
--rxd=1024
testpmd>start
3. launch another virtio-user with queue number=8:

 ./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -n 4 -l 7-8 --no-pci
--file-prefix=virtio1 \
--vdev=net_virtio_user1,mac=00:01:02:03:04:55,path=/tmp/s1,server=1,mrg_rxbuf=0,in_order=0,queues=8
\
-- -i --tx-offloads=0x0 --enable-hw-vlan-strip --nb-cores=1 --txd=1024
--rxd=1024
testpmd>start tx_first 32
4. relaunch two vhost ports :

./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -n 4 -l 2-4 --file-prefix=vhost 
--vdev 'net_vhost0,iface=/tmp/s0,client=1,queues=1' --vdev
'net_vhost1,iface=/tmp/s1,client=1,queues=1' -- -i --nb-cores=1 --txd=1024
--rxd=1024

Result:
Checking link statuses...
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 0 to qp idx: 0
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 0 to qp idx: 0
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 0 to qp idx: 1
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 0 to qp idx: 1
VHOST_CONFIG: read message VHOST_USER_GET_VRING_BASE
VHOST_CONFIG: vring base idx:0 file:0
VHOST_CONFIG: read message VHOST_USER_GET_VRING_BASE
VHOST_CONFIG: vring base idx:0 file:0
VHOST_CONFIG: read message VHOST_USER_GET_VRING_BASE
VHOST_CONFIG: vring base idx:1 file:0
VHOST_CONFIG: read message VHOST_USER_GET_VRING_BASE
VHOST_CONFIG: vring base idx:1 file:0
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_GET_VRING_BASE
Segmentation fault

Expected Result:
vhost port can relaunch success and 1 queue can wok.


Is this issue a regression: Y
Version the regression was introduced: Specify git id if known.
root@dpdk:~/dpdk# git bisect bad 
commit 7804bbd13aa583e3a28b08557f7a98fcbe7fc8a8 (HEAD -> main)
Author: Maxime Coquelin 
Date:   Fri Nov 6 15:47:44 2020 +0100

vhost: fix virtqueue initialization

This patches fixes virtqueue initialization issue causing
segfault or file descriptor being closed unexpectedly.

The wrong index was passed to init_vring_queue() by
alloc_vring_queue() when a hole in the virtqueue array was
met.

Fixes: 8acd7c213353 ("vhost: fix virtqueues metadata allocation")
Cc: sta...@dpdk.org

Reported-by: Yu Jiang 
Signed-off-by: Maxime Coquelin 
Reviewed-by: David Marchand 
Tested-by: Yu Jiang 

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [dpdk-dev] [PATCH 0/4] bnxt fixes

2021-06-16 Thread Ajit Khaparde
On Tue, Jun 8, 2021 at 7:52 PM Kalesh A P
 wrote:
>
> From: Kalesh AP 
>
> This set contains few bnxt fixes and code cleanup changes.

Patchset applied to dpdk-next-net-brcm for-next-net branch.

>
> Kalesh AP (4):
>   net/bnxt: cleanup code
>   net/bnxt: fix typo in log message
>   net/bnxt: fix enabling autoneg on Whitney+
>   net/bnxt: invoke device removal event on recovery failure
>
>  drivers/net/bnxt/bnxt_ethdev.c | 11 ---
>  drivers/net/bnxt/bnxt_hwrm.c   | 22 ++
>  2 files changed, 18 insertions(+), 15 deletions(-)
>
> --
> 2.10.1
>


Re: [dpdk-dev] [PATCH v2] app/testpmd: send failure logs to stderr

2021-06-16 Thread Li, Xiaoyun
Hi

> -Original Message-
> From: Andrew Rybchenko 
> Sent: Thursday, June 17, 2021 00:32
> To: Li, Xiaoyun ; Ori Kam 
> Cc: dev@dpdk.org; Richardson, Bruce ; Yigit,
> Ferruh ; Singh, Aman Deep
> 
> Subject: [PATCH v2] app/testpmd: send failure logs to stderr
> 
> Running with stdout suppressed or redirected for further processing
> is very confusing in the case of errors. Fix it by logging errors and
> warnings to stderr.
> 
> Since lines with log messages are touched anyway concatanate split

Typo: "concatenate"
And it's really good to have those split strings combined. Thanks.

> format string to make it easier to search using grep.
> 
> Fix indent of format string arguments.
> 
> Signed-off-by: Andrew Rybchenko 
> ---
> v2:
>  - switch from printf() to fpritnf(stderr, ...) in more cases
>  - do not inherit acks from the previous version since the patch is
>much bigger
>  - fix style in few cases (TAB vs spaces, missing space separtor etc)
>  - still don't use TESTPMD_LOG() since the patch does not add new logs.
>Also switching to TESTPMD_LOG() will add "testpmd: " prefix to log
>messages and it is a real change and could be a pain for automation.
> 
>  app/test-pmd/bpf_cmd.c |   6 +-
>  app/test-pmd/cmdline.c | 957 ++---
>  app/test-pmd/cmdline_flow.c|  20 +-
>  app/test-pmd/cmdline_mtr.c |   8 +-
>  app/test-pmd/cmdline_tm.c  |  33 +-
>  app/test-pmd/config.c  | 452 ++--
>  app/test-pmd/csumonly.c|   5 +-
>  app/test-pmd/parameters.c  |  21 +-
>  app/test-pmd/testpmd.c | 298 
>  app/test-pmd/util.c|  19 +-
>  doc/guides/rel_notes/release_21_08.rst |   5 +
>  11 files changed, 1002 insertions(+), 822 deletions(-)

> 2.30.2

This patch overall looks good to me.

But there're some warnings about coding styles reported. 
http://mails.dpdk.org/archives/test-report/2021-June/199047.html
Not all of them makes sense.
But can you check ones with QUOTED_WHITESPACE_BEFORE_NEWLINE, UNSPECIFIED_INT, 
EMBEDDED_FUNCTION_NAME?
The latter two type of warnings are legacy issues but it'll be good to have 
them fixed.

BRs
Xiaoyun


Re: [dpdk-dev] [PATCH v1] net/i40e: fix flow director does not work

2021-06-16 Thread Zhang, Qi Z



> -Original Message-
> From: Thomas Monjalon 
> Sent: Thursday, June 17, 2021 1:23 AM
> To: Xing, Beilei ; Yang, SteveX
> ; Zhang, Qi Z 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1] net/i40e: fix flow director does not work
> 
> 01/06/2021 13:12, Zhang, Qi Z:
> > > > When user configured the flow rule with raw packet via command
> > > > "flow_director_filter", it would reset all previous fdir input set
> > > > flags with "i40e_flow_set_fdir_inset()".
> > > >
> > > > Ignore to configure the flow input set with raw packet rule used.
> > > >
> > > > Fixes: ff04964ea6d5 ("net/i40e: fix flow director for common
> > > > pctypes")
> > > >
> > > > Signed-off-by: Steve Yang 
> > >
> > > Acked-by: Beilei Xing 
> >
> > Applied to dpdk-next-net-intel.
> 
> Why Cc:stable is not added?
> 

Yes, the patch need to be backported as the original patch is a fix that need 
to be backported also, thanks for the capture!




Re: [dpdk-dev] [PATCH] kni: fix compilation on SLES15-SP3

2021-06-16 Thread Christian Ehrhardt
On Thu, Jun 10, 2021 at 12:30 PM Christian Ehrhardt
 wrote:
>
> On Thu, Jun 10, 2021 at 10:39 AM Christian Ehrhardt
>  wrote:
> >
> > On Tue, Jun 8, 2021 at 1:17 PM Ferruh Yigit  wrote:
> > >
> > > On 6/2/2021 3:33 PM, Christian Ehrhardt wrote:
> > > > Like what was done for mainline kernel in commit 38ad54f3bc76 ("kni: fix
> > > > build with Linux 5.6"), a new parameter 'txqueue' has to be added to
> > > > 'ndo_tx_timeout' ndo on SLES 15-SP3 kernel.
> > > >
> > > > Caused by:
> > > >   commit c3bf155c40e9db722feb8a08c19efd44c12d5294
> > > >   Author: Thomas Bogendoerfer 
> > > >   Date:   Fri Sep 11 16:08:31 2020 +0200
> > > >   - netdev: pass the stuck queue to the timeout handler
> > > > (jsc#SLE-13536).
> > > >   - Refresh patches.suse/sfc-move-various-functions.patch.
> > > >
> > > > That is part of the SLES 5.3.18 kernel and therefore the
> > > > version we check for.
> > > >
> > > > Cc: sta...@dpdk.org
> > > >
> > > > Signed-off-by: Christian Ehrhardt 
> > >
> > > Hi Christian,
> > >
> > > There is a build error reported in CI [1] with 'SUSE15-64'.
> > > Can't the check 'linux version >= 5.3.18" may hit multiple SUSE versions, 
> > > with
> > > some has the patch mentioned above backported and some did not?
> > > Can 'SLE_VERSION_CODE' be used to differentiate the SUSE versions?
> >
> > I don't have a perfect insight in the SUSE distro variants and their
> > kernel versions.
> > > 5.3.18 in SLES15-SP3 was what broke it and I have hoped that this would 
> > > apply in general.
> > But the error above seems we have others that are > 5.3.18 but at the
> > same time not have the backport.
> >
> > I'll try to create a v3, but do we have anyone from Suse to usually
> > directly ping for feedback on this?
>
> With the new version (not submitted since it fails me) you can have a
> look at my personal WIP branch:
> => 
> https://github.com/cpaelzer/dpdk-stable-queue/commit/43b908fe83e9cd68b08e259c0ace26ec692bb737

Hello everyone,
Ferruh and I reached out to the Suse people working on DPDK in the
past as well as those doing the kernel backport that breaks it now.
(I'll add them to CC here as well)
Unfortunately there was no feedback in a week, but OTOH I also don't
want to stall releases for too long due to this.

I'll try to summarize the current understanding of this case again

[1] breaks our KNI build.

SLE_VERSION isn't provided by their Kernel; it is in DPDKs
kernel/linux/kni/compat.h and not further maintained for a while.
So we can't differentiate SLE15SP2 vs SLE15SP3 via that.

The offending change was introduced in their kernel by [1]
$ git tag --contains c3bf155c40e9 | sort | head
rpm-5.3.18-24
...

But checking just the kernel version 5.3.18 (as my initial patch had)
won't work either.
The problem is that this only checks the three levels of kernel
version, but not the packaging level.
And to make things even more fun, while I don't know if opensuse leap
has the patch applied or not atm, but the kernel version there might
make this even more complex as it is 5.3.18-lp152 at the moment.

We have now:
SLE15 SP2 5.3.18-22
SLE15 SP3 5.3.18-57 (>=24)
opensuse_leap 5.3.18-lp152

Without a change SLE15SP3 is broken due to that backport.
By checking on >=5.3.18 we could fix SP3, but break SP2 and maybe opensuse_leap.

Maybe there is something on LOCALVERSION/EXTRAVERSION we can use, but
"guessing" how the Suse kernel behaves isn't a good approach.
Once Suse lets us know how to better differentiate their packaging
version we can reconsider a proper fix for this.

But without further input from Suse I'd (for now) ask to keep things
as is (= not applying my patch).
Due to that it will build in the same places it has built in the past.
If we find a solution it can be in the next release in ~3 months, but
I'll not further stall e.g. 19.11.9 that I'm working on right now.

[1]: https://github.com/SUSE/kernel/commit/c3bf155c40e9



> Now "my SLE15" fails
>
> [  232s] 
> /home/abuild/rpmbuild/BUILD/dpdk-1623314498.43b908fe8/x86_64-default-linux-gcc/build/kernel/linux/kni/kni_net.c:791:20:
> error: initialization from incompatible pointer type
> [-Werror=incompatible-pointer-types]
> [  232s]   .ndo_tx_timeout = kni_net_tx_timeout,
> [  232s] ^~
>
> The full log is at
> https://build.opensuse.org/package/live_build_log/home:cpaelzer:branches:home:bluca:dpdk/dpdk-19.11/SLE_15/x86_64
>
> That means now the check is now no more catching this SLE15 with their
> linux-5.3.18-57.
>
> I need to find what SLE_VERSION_CODE really is in that environment :-/
> Sadly I don't have that system locally.
>
> So the V3 will be delayed, but any additional input is welcome.
>
> P.S. or I can't make an >= check on SLE_VERSION_CODE ...?
>
>
> > > [1]
> > > http://mails.dpdk.org/archives/test-report/2021-June/197571.html
> > >
> > > > ---
> > > >  kernel/linux/kni/compat.h | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/linux/kni/co

[dpdk-dev] 19.11.9 patches review and test

2021-06-16 Thread Christian Ehrhardt
Hi all,

Here is a list of patches targeted for stable release 19.11.9.

The planned date for the final release is 2nd July.

Please help with testing and validation of your use cases and report
any issues/results with reply-all to this mail. For the final release
the fixes and reported validations will be added to the release notes.

A renewed release candidate tarball can be found at:

https://dpdk.org/browse/dpdk-stable/tag/?id=v19.11.9-rc3

These patches are located at branch 19.11 of dpdk-stable repo:
https://dpdk.org/browse/dpdk-stable/

Thanks.

Christian Ehrhardt 

---
Adam Dybkowski (3):
  common/qat: increase IM buffer size for GEN3
  compress/qat: enable compression on GEN3
  crypto/qat: fix null authentication request

Ajit Khaparde (3):
  net/bnxt: fix RSS context cleanup
  net/bnxt: fix mismatched type comparison in MAC restore
  net/bnxt: check PCI config read

Alvin Zhang (7):
  net/ice: fix VLAN filter with PF
  net/i40e: fix input set field mask
  net/e1000: fix Rx error counter for bad length
  net/e1000: fix max Rx packet size
  net/ice: fix fast mbuf freeing
  net/iavf: fix VF to PF command failure handling
  net/i40e: fix VF RSS configuration

Anatoly Burakov (3):
  fbarray: fix log message on truncation error
  power: do not skip saving original P-state governor
  power: save original ACPI governor always

Andrew Rybchenko (2):
  net/failsafe: fix RSS hash offload reporting
  net/failsafe: report minimum and maximum MTU

Andy Moreton (1):
  common/sfc_efx/base: limit reported MCDI response length

Apeksha Gupta (1):
  examples/l2fwd-crypto: skip masked devices

Arek Kusztal (1):
  crypto/qat: fix offset for out-of-place scatter-gather

Beilei Xing (2):
  net/i40evf: fix packet loss for X722
  net/iavf: fix Tx context descriptor

Bruce Richardson (1):
  build: exclude meson files from examples installation

Chaoyong He (1):
  doc: fix multiport syntax in nfp guide

Chenbo Xia (1):
  examples/vhost: check memory table query

Chengchang Tang (13):
  ethdev: validate input in module EEPROM dump
  ethdev: validate input in register info
  ethdev: validate input in EEPROM info
  net/hns3: fix rollback after setting PVID failure
  examples: add eal cleanup to examples
  net/bonding: fix adding itself as its slave
  app/testpmd: fix max queue number for Tx offloads
  net/tap: fix interrupt vector array size
  net/bonding: fix socket ID check
  net/tap: check ioctl on restore
  net/hns3: fix HW buffer size on MTU update
  net/hns3: fix processing Tx offload flags
  examples/timer: fix time interval

Chengwen Feng (32):
  net/hns3: fix flow counter value
  net/hns3: fix VF mailbox head field
  net/hns3: support get device version when dump register
  test: check thread creation
  common/dpaax: fix possible null pointer access
  examples/ethtool: remove unused parsing
  net/e1000/base: fix timeout for shadow RAM write
  mbuf: check shared memory before dumping dynamic space
  eventdev: remove redundant thread name setting
  eventdev: fix memory leakage on thread creation failure
  net/kni: check init result
  net/hns3: fix mailbox error message
  net/hns3: remove unused mailbox macro and struct
  net/bonding: fix leak on remove
  net/i40e: fix negative VEB index
  net/i40e: remove redundant VSI check in Tx queue setup
  net/hns3: log time delta in decimal format
  net/hns3: remove unused macros
  net/hns3: remove unused VMDq code
  raw/ntb: check SPAD user index
  raw/ntb: check memory allocations
  ipc: check malloc sync reply result
  eal: fix service core list parsing
  net/hns3: fix handling link update
  ipc: use monotonic clock
  net/hns3: return error on PCI config write failure
  net/hns3: clear hash map on flow director clear
  net/hns3: fix querying flow director counter for out param
  net/hns3: fix secondary process request start/stop Rx/Tx
  net/hns3: fix ordering in secondary process initialization
  net/mlx4: fix secondary process initialization ordering
  net/mlx5: fix secondary process initialization ordering

Christian Ehrhardt (5):
  vfio: fix stdbool usage without include
  kni: fix compilation on SLES15-SP3
  version: 19.11.9-rc1
  version: 19.11.9-rc2
  Revert "kni: fix compilation on SLES15-SP3"

Ciara Loftus (1):
  net/af_xdp: fix error handling during Rx queue setup

Conor Walsh (1):
  examples/l3fwd: fix LPM IPv6 subnets

Dapeng Yu (2):
  net/e1000: remove MTU setting limitation
  examples/packet_ordering: fix port configuration

David Christensen (1):
  config/ppc: reduce number of cores and NUMA nodes

David Harton (1):
  net/ena: fix releasing Tx ring mbufs

David Hunt (4):
  test/power: fix CPU frequency check
  test/power: add

Re: [dpdk-dev] [PATCH] kni: fix compilation on SLES15-SP3

2021-06-16 Thread Thomas Monjalon
17/06/2021 08:14, Christian Ehrhardt:
> On Thu, Jun 10, 2021 at 12:30 PM Christian Ehrhardt
>  wrote:
> > On Thu, Jun 10, 2021 at 10:39 AM Christian Ehrhardt
> >  wrote:
> > > On Tue, Jun 8, 2021 at 1:17 PM Ferruh Yigit  
> > > wrote:
> > > > On 6/2/2021 3:33 PM, Christian Ehrhardt wrote:
> > > > > Like what was done for mainline kernel in commit 38ad54f3bc76 ("kni: 
> > > > > fix
> > > > > build with Linux 5.6"), a new parameter 'txqueue' has to be added to
> > > > > 'ndo_tx_timeout' ndo on SLES 15-SP3 kernel.
> > > > >
> > > > > Caused by:
> > > > >   commit c3bf155c40e9db722feb8a08c19efd44c12d5294
> > > > >   Author: Thomas Bogendoerfer 
> > > > >   Date:   Fri Sep 11 16:08:31 2020 +0200
> > > > >   - netdev: pass the stuck queue to the timeout handler
> > > > > (jsc#SLE-13536).
> > > > >   - Refresh patches.suse/sfc-move-various-functions.patch.
> > > > >
> > > > > That is part of the SLES 5.3.18 kernel and therefore the
> > > > > version we check for.
> > > > >
> > > > > Cc: sta...@dpdk.org
> > > > >
> > > > > Signed-off-by: Christian Ehrhardt 
> > > >
> > > > Hi Christian,
> > > >
> > > > There is a build error reported in CI [1] with 'SUSE15-64'.
> > > > Can't the check 'linux version >= 5.3.18" may hit multiple SUSE 
> > > > versions, with
> > > > some has the patch mentioned above backported and some did not?
> > > > Can 'SLE_VERSION_CODE' be used to differentiate the SUSE versions?
> > >
> > > I don't have a perfect insight in the SUSE distro variants and their
> > > kernel versions.
> > > > 5.3.18 in SLES15-SP3 was what broke it and I have hoped that this would 
> > > > apply in general.
> > > But the error above seems we have others that are > 5.3.18 but at the
> > > same time not have the backport.
> > >
> > > I'll try to create a v3, but do we have anyone from Suse to usually
> > > directly ping for feedback on this?
> >
> > With the new version (not submitted since it fails me) you can have a
> > look at my personal WIP branch:
> > => 
> > https://github.com/cpaelzer/dpdk-stable-queue/commit/43b908fe83e9cd68b08e259c0ace26ec692bb737
> 
> Hello everyone,
> Ferruh and I reached out to the Suse people working on DPDK in the
> past as well as those doing the kernel backport that breaks it now.
> (I'll add them to CC here as well)
> Unfortunately there was no feedback in a week, but OTOH I also don't
> want to stall releases for too long due to this.
> 
> I'll try to summarize the current understanding of this case again
> 
> [1] breaks our KNI build.
> 
> SLE_VERSION isn't provided by their Kernel; it is in DPDKs
> kernel/linux/kni/compat.h and not further maintained for a while.
> So we can't differentiate SLE15SP2 vs SLE15SP3 via that.
> 
> The offending change was introduced in their kernel by [1]
> $ git tag --contains c3bf155c40e9 | sort | head
> rpm-5.3.18-24
> ...
> 
> But checking just the kernel version 5.3.18 (as my initial patch had)
> won't work either.
> The problem is that this only checks the three levels of kernel
> version, but not the packaging level.
> And to make things even more fun, while I don't know if opensuse leap
> has the patch applied or not atm, but the kernel version there might
> make this even more complex as it is 5.3.18-lp152 at the moment.
> 
> We have now:
> SLE15 SP2 5.3.18-22
> SLE15 SP3 5.3.18-57 (>=24)
> opensuse_leap 5.3.18-lp152
> 
> Without a change SLE15SP3 is broken due to that backport.
> By checking on >=5.3.18 we could fix SP3, but break SP2 and maybe 
> opensuse_leap.
> 
> Maybe there is something on LOCALVERSION/EXTRAVERSION we can use, but
> "guessing" how the Suse kernel behaves isn't a good approach.
> Once Suse lets us know how to better differentiate their packaging
> version we can reconsider a proper fix for this.
> 
> But without further input from Suse I'd (for now) ask to keep things
> as is (= not applying my patch).
> Due to that it will build in the same places it has built in the past.
> If we find a solution it can be in the next release in ~3 months, but
> I'll not further stall e.g. 19.11.9 that I'm working on right now.
> 
> [1]: https://github.com/SUSE/kernel/commit/c3bf155c40e9

Thank you for the summary.

This explains well why we should stop supporting KNI.




Re: [dpdk-dev] [PATCH] acl: fix build with GCC 6.3

2021-06-16 Thread Thomas Monjalon
21/05/2021 16:42, Konstantin Ananyev:
> --buildtype=debug with gcc 6.3 produces the following error:
> 
> ../lib/librte_acl/acl_run_avx512_common.h: In function
> ‘resolve_match_idx_avx512x16’:
> ../lib/librte_acl/acl_run_avx512x16.h:33:18: error:
>   the last argument must be an 8-bit immediate
>^
> ../lib/librte_acl/acl_run_avx512_common.h:373:9: note:
>   in expansion of macro ‘_M_I_’
>   return _M_I_(slli_epi32)(mi, match_log);
>  ^
> 
> Seems like gcc-6.3 complains about the following construct:
> 
> static const uint32_t match_log = 5;
> ...
> _mm512_slli_epi32(mi, match_log);
> 
> It can't substitute constant variable 'match_log' with its actual value.
> The fix replaces constant variable with its immediate value.
> 
> Bugzilla ID: 717
> Fixes: b64c2295f7fc ("acl: add 256-bit AVX512 classify method")
> Fixes: 45da22e42ec3 ("acl: add 512-bit AVX512 classify method")
> Cc: sta...@dpdk.org
> 
> Reported-by: Liang Ma 
> Signed-off-by: Konstantin Ananyev 

Applied, thanks





[dpdk-dev] [PATCH v1] net/mlx5: fix IPIP multi tunnel validation

2021-06-16 Thread Lior Margalit
A flow rule must not include multiple tunnel layers.
An attempt to create such a rule, for example:
testpmd> flow create .../ vxlan / eth / ipv4 proto is 4 / end 
results in an unclear error.

In the current implementation there is a check for
multiple IPIP tunnels, but not for combination of IPIP
and a different kind of tunnel, such as VXLAN. The fix
is to enhance the above check to use MLX5_FLOW_LAYER_TUNNEL
that consists of all the tunnel masks. The error message
will be "multiple tunnel not supported".

Fixes: 5e33bebdd8d3 ("net/mlx5: support IP-in-IP tunnel")
Cc: sta...@dpdk.org

Signed-off-by: Lior Margalit 
Acked-by: Ori Kam 
---
 drivers/net/mlx5/mlx5_flow.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index e5e062d09a..c5c767aaee 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2124,7 +2124,7 @@ mlx5_flow_validate_item_ipv4(const struct rte_flow_item 
*item,
  RTE_FLOW_ERROR_TYPE_ITEM, item,
  "IPv4 cannot follow L2/VLAN layer "
  "which ether type is not IPv4");
-   if (item_flags & MLX5_FLOW_LAYER_IPIP) {
+   if (item_flags & MLX5_FLOW_LAYER_TUNNEL) {
if (mask && spec)
next_proto = mask->hdr.next_proto_id &
 spec->hdr.next_proto_id;
@@ -2232,7 +2232,7 @@ mlx5_flow_validate_item_ipv6(const struct rte_flow_item 
*item,
  "which ether type is not IPv6");
if (mask && mask->hdr.proto == UINT8_MAX && spec)
next_proto = spec->hdr.proto;
-   if (item_flags & MLX5_FLOW_LAYER_IPV6_ENCAP) {
+   if (item_flags & MLX5_FLOW_LAYER_TUNNEL) {
if (next_proto == IPPROTO_IPIP || next_proto == IPPROTO_IPV6)
return rte_flow_error_set(error, EINVAL,
  RTE_FLOW_ERROR_TYPE_ITEM,
-- 
2.21.0



Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Tuesday, 15 June 2021 18.39
> 
> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> > This patch introduces 'dmadevice' which is a generic type of DMA
> > device.
> >
> > The APIs of dmadev library exposes some generic operations which can
> > enable configuration and I/O with the DMA devices.
> >
> > Signed-off-by: Chengwen Feng 
> > ---
> Thanks for sending this.
> 
> Of most interest to me right now are the key data-plane APIs. While we
> are
> still in the prototyping phase, below is a draft of what we are
> thinking
> for the key enqueue/perform_ops/completed_ops APIs.
> 
> Some key differences I note in below vs your original RFC:
> * Use of void pointers rather than iova addresses. While using iova's
> makes
>   sense in the general case when using hardware, in that it can work
> with
>   both physical addresses and virtual addresses, if we change the APIs
> to use
>   void pointers instead it will still work for DPDK in VA mode, while
> at the
>   same time allow use of software fallbacks in error cases, and also a
> stub
>   driver than uses memcpy in the background. Finally, using iova's
> makes the
>   APIs a lot more awkward to use with anything but mbufs or similar
> buffers
>   where we already have a pre-computed physical address.
> * Use of id values rather than user-provided handles. Allowing the
> user/app
>   to manage the amount of data stored per operation is a better
> solution, I
>   feel than proscribing a certain about of in-driver tracking. Some
> apps may
>   not care about anything other than a job being completed, while other
> apps
>   may have significant metadata to be tracked. Taking the user-context
>   handles out of the API also makes the driver code simpler.
> * I've kept a single combined API for completions, which differs from
> the
>   separate error handling completion API you propose. I need to give
> the
>   two function approach a bit of thought, but likely both could work.
> If we
>   (likely) never expect failed ops, then the specifics of error
> handling
>   should not matter that much.
> 
> For the rest, the control / setup APIs are likely to be rather
> uncontroversial, I suspect. However, I think that rather than xstats
> APIs,
> the library should first provide a set of standardized stats like
> ethdev
> does. If driver-specific stats are needed, we can add xstats later to
> the
> API.
> 
> Appreciate your further thoughts on this, thanks.
> 
> Regards,
> /Bruce

I generally agree with Bruce's points above.

I would like to share a couple of ideas for further discussion:

1. API for bulk operations.
The ability to prepare a vector of DMA operations, and then post it to the DMA 
driver.

2. Prepare the API for more complex DMA operations than just copy/fill.
E.g. blitter operations like "copy A bytes from the source starting at address 
X, to the destination starting at address Y, masked with the bytes starting at 
address Z, then skip B bytes at the source and C bytes at the destination, 
rewind the mask to the beginning of Z, and repeat D times". This is just an 
example.
I'm suggesting to use a "DMA operation" union structure as parameter to the 
command enqueue function, rather than having individual functions for each 
possible DMA operation.
I know I'm not the only one old enough on the mailing list to have worked with 
the Commodore Amiga's blitter. :-)
DPDK has lots of code using CPU vector instructions to shuffle bytes around. I 
can easily imagine a DMA engine doing similar jobs, possibly implemented in an 
FPGA or some other coprocessor.

-Morten



[dpdk-dev] 回复: [dpdk-stable] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64

2021-06-16 Thread Feifei Wang
Hi, everyone

This patch can be closed with the following reasons.

> -邮件原件-
> 发件人: dev  代表 Honnappa Nagarahalli
> 发送时间: 2021年3月28日 9:00
> 收件人: tho...@monjalon.net; Takeshi Yoshimura
> 
> 抄送: sta...@dpdk.org; dev@dpdk.org; olivier.m...@6wind.com;
> chao...@linux.vnet.ibm.com; konstantin.anan...@intel.com; Jerin Jacob
> ; nd ; nd 
> 主题: Re: [dpdk-dev] [dpdk-stable] [PATCH] rte_ring: fix racy
> dequeue/enqueue in ppc64
> 
> 
> 
> > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] rte_ring: fix racy
> > dequeue/enqueue in ppc64
> >
> > No reply after more than 2 years.
> > Unfortunately it is probably outdated now.
> > Classified as "Changes Requested".
> Looking at the code, I think this patch in fact fixes a bug. Appreciate 
> rebasing
> this patch.
> 
> The problem is already fixed in '__rte_ring_move_cons_head' but needs to
> be fixed in '__rte_ring_move_prod_head'.
> This problem is fixed for C11 version due to acquire load of cons.tail and
> prod.tail.

First, for consumer in dequeue:
the reason for that adding a rmb in move_cons_head of “generic” is based on 
this patch:
http://patches.dpdk.org/project/dpdk/patch/1552409933-45684-2-git-send-email-gavin...@arm.com/

SlotConsumer
   Producer
1 dequeue elements
2   
   update prod_tail
3   load new prod_tail
4   check room is enough(n < entries)

Dequeue elements maybe before load updated prod_tail, so consumer can load 
incorrect elements value.
For dequeue multiple consumers case, ‘rte_atomic32_cmpset’ with acquire and 
release order can prevent
dequeue before load prod_tail, no extra rmb is needed.

Second, for single producer in enqueue:

SlotProducer
 Consumer
1 enqueue elements(not commited)
2   
   update consumer_tail
3   load new consumer_tail
4   check room is enough(n < entries)
5   enqueued elements is committed

Though enqueue elements maybe reorder before load consumer_tail, these elements 
will not be committed until
‘check’ has finished. So from load to write control dependency is reliable and 
rmb is not needed here.
[1] https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf (page:15)

As a result, it is unnecessary to add a rmb for enqueue single producer due to 
control dependency. And this patch can
be closed.

Best Regards
Feifei
> 
> >
> >
> > 17/07/2018 05:34, Jerin Jacob:
> > > From: Takeshi Yoshimura 
> > >
> > > Cc: olivier.m...@6wind.com
> > > Cc: chao...@linux.vnet.ibm.com
> > > Cc: konstantin.anan...@intel.com
> > >
> > > >
> > > > > Adding rte_smp_rmb() cause performance regression on non x86
> > platforms.
> > > > > Having said that, load-load barrier can be expressed very  well
> > > > > with C11 memory model. I guess ppc64 supports C11 memory model.
> > > > > If so, Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y
> for
> > > > > ppc64 and check original issue?
> > > >
> > > > Yes, the performance regression happens on non-x86 with single
> > > > producer/consumer.
> > > > The average latency of an enqueue was increased from 21 nsec to 24
> > > > nsec in my simple experiment. But, I think it is worth it.
> > >
> > > That varies to machine to machine. What is the burst size etc.
> > >
> > > >
> > > >
> > > > I also tested C11 rte_ring, however, it caused the same race
> > > > condition in
> > ppc64.
> > > > I tried to fix the C11 problem as well, but I also found the C11
> > > > rte_ring had other potential incorrect choices of memory orders,
> > > > which caused another race condition in ppc64.
> > >
> > > Does it happens on all ppc64 machines? Or on a specific machine?
> > > Is following tests are passing on your system without the patch?
> > > test/test/test_ring_perf.c
> > > test/test/test_ring.c
> > >
> > > >
> > > > For example,
> > > > __ATOMIC_ACQUIRE is passed to __atomic_compare_exchange_n(),
> but I
> > > > am not sure why the load-acquire is used for the compare exchange.
> > >
> > > It correct as per C11 acquire and release semantics.
> > >
> > > > Also in update_tail, the pause can be called before the data copy
> > > > because of ht->tail load without atomic_load_n.
> > > >
> > > > The memory order is simply difficult, so it might take a bit
> > > > longer time to check if the code is correct. I think I can fix the
> > > > C11 rte_ring as another patch.
> > > >
> > > > >>
> > > > >> SPDK blobfs encountered a crash around rte_ring dequeues in ppc64.
> > > > >> It uses a single consumer and multiple producers for a rte_ring.
> > > > >> The problem was a load-load reorder in
> rte_ring_sc_dequeue_bulk().
> > > > >
> > > > > Adding rte_smp_rmb(

Re: [dpdk-dev] [PATCH 00/20] Add Marvell CNXK crypto PMDs

2021-06-16 Thread Akhil Goyal
> Subject: [PATCH 00/20] Add Marvell CNXK crypto PMDs
> 
> Add cnxk crypto PMDs supporting Marvell CN106XX SoC, based on
> 'common/cnxk'.
> 
> This series utilizes 'common/cnxk' to register cn9k & cn10k crypto PMDs and
> add symmetric cryptographic features for the same.
> 
> Depends-on: series-17212 ("Add CPT in Marvell CNXK common driver")
> 
> Ankur Dwivedi (5):
>   crypto/cnxk: add driver skeleton
>   crypto/cnxk: add probe and remove
>   crypto/cnxk: add device control ops
>   crypto/cnxk: add symmetric crypto capabilities
>   crypto/cnxk: add queue pair ops
> 
> Anoob Joseph (5):
>   crypto/cnxk: add session ops framework
>   crypto/cnxk: add enqueue burst op
>   crypto/cnxk: add dequeue burst op
>   crypto/cnxk: add cipher operation in session
>   crypto/cnxk: add auth operation in session
> 
> Archana Muniganti (5):
>   crypto/cnxk: add aead operation in session
>   crypto/cnxk: add chained operation in session
>   crypto/cnxk: add flexi crypto cipher encrypt
>   crypto/cnxk: add flexi crypto cipher decrypt
>   crypto/cnxk: add ZUC and SNOW3G encrypt
> 
> Tejasree Kondoj (5):
>   crypto/cnxk: add ZUC and SNOW3G decrypt
>   crypto/cnxk: add KASUMI encrypt
>   crypto/cnxk: add KASUMI decrypt
>   crypto/cnxk: add digest support
>   test/crypto: enable cnxk crypto PMDs
> 
>  MAINTAINERS   |9 +
>  app/test/meson.build  |2 +
>  app/test/test_cryptodev.c |   14 +
>  app/test/test_cryptodev.h |2 +
>  doc/guides/cryptodevs/features/cn10k.ini  |   62 +
>  doc/guides/cryptodevs/features/cn9k.ini   |   66 +

Please add PMD documentation.

>  drivers/crypto/cnxk/cn10k_cryptodev.c |  147 +
>  drivers/crypto/cnxk/cn10k_cryptodev.h |   13 +
>  drivers/crypto/cnxk/cn10k_cryptodev_ops.c |  357 +++
>  drivers/crypto/cnxk/cn10k_cryptodev_ops.h |   15 +
>  drivers/crypto/cnxk/cn9k_cryptodev.c  |  145 +
>  drivers/crypto/cnxk/cn9k_cryptodev.h  |   13 +
>  drivers/crypto/cnxk/cn9k_cryptodev_ops.c  |  319 +++
>  drivers/crypto/cnxk/cn9k_cryptodev_ops.h  |   14 +
>  drivers/crypto/cnxk/cnxk_cpt_ops_helper.c |   28 +
>  drivers/crypto/cnxk/cnxk_cpt_ops_helper.h |   20 +
>  drivers/crypto/cnxk/cnxk_cryptodev.c  |   33 +
>  drivers/crypto/cnxk/cnxk_cryptodev.h  |   38 +
>  drivers/crypto/cnxk/cnxk_cryptodev_capabilities.c |  755 +
>  drivers/crypto/cnxk/cnxk_cryptodev_capabilities.h |   25 +
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  534 
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.h  |  109 +
>  drivers/crypto/cnxk/cnxk_se.h | 3052 
> +
>  drivers/crypto/cnxk/meson.build   |   22 +
>  drivers/crypto/cnxk/version.map   |3 +
>  drivers/crypto/meson.build|1 +
>  26 files changed, 5798 insertions(+)
>  create mode 100644 doc/guides/cryptodevs/features/cn10k.ini
>  create mode 100644 doc/guides/cryptodevs/features/cn9k.ini
>  create mode 100644 drivers/crypto/cnxk/cn10k_cryptodev.c
>  create mode 100644 drivers/crypto/cnxk/cn10k_cryptodev.h
>  create mode 100644 drivers/crypto/cnxk/cn10k_cryptodev_ops.c
>  create mode 100644 drivers/crypto/cnxk/cn10k_cryptodev_ops.h
>  create mode 100644 drivers/crypto/cnxk/cn9k_cryptodev.c
>  create mode 100644 drivers/crypto/cnxk/cn9k_cryptodev.h
>  create mode 100644 drivers/crypto/cnxk/cn9k_cryptodev_ops.c
>  create mode 100644 drivers/crypto/cnxk/cn9k_cryptodev_ops.h
>  create mode 100644 drivers/crypto/cnxk/cnxk_cpt_ops_helper.c
>  create mode 100644 drivers/crypto/cnxk/cnxk_cpt_ops_helper.h
>  create mode 100644 drivers/crypto/cnxk/cnxk_cryptodev.c
>  create mode 100644 drivers/crypto/cnxk/cnxk_cryptodev.h
>  create mode 100644 drivers/crypto/cnxk/cnxk_cryptodev_capabilities.c
>  create mode 100644 drivers/crypto/cnxk/cnxk_cryptodev_capabilities.h
>  create mode 100644 drivers/crypto/cnxk/cnxk_cryptodev_ops.c
>  create mode 100644 drivers/crypto/cnxk/cnxk_cryptodev_ops.h
>  create mode 100644 drivers/crypto/cnxk/cnxk_se.h
>  create mode 100644 drivers/crypto/cnxk/meson.build
>  create mode 100644 drivers/crypto/cnxk/version.map
> 
> --
> 2.7.4



Re: [dpdk-dev] [PATCH 01/20] crypto/cnxk: add driver skeleton

2021-06-16 Thread Akhil Goyal


> 
> +Marvell cnxk
This should be Marvell cnxk crypto as we have net and event PMD
with the same name.

> +M: Ankur Dwivedi 
> +M: Anoob Joseph 
> +M: Tejasree Kondoj 
> +F: drivers/crypto/cnxk/
> +F: doc/guides/cryptodevs/cnxk.rst
> +F: doc/guides/cryptodevs/features/cn9k.ini
> +F: doc/guides/cryptodevs/features/cn10k.ini
> +



Re: [dpdk-dev] [PATCH 01/20] crypto/cnxk: add driver skeleton

2021-06-16 Thread Anoob Joseph
Hi Akhil,

> >
> > +Marvell cnxk
> This should be Marvell cnxk crypto as we have net and event PMD with the same
> name.

[Anoob] Mempool & event already follows this convention for Marvell cnxk. Net 
driver (which is in pipeline) is also adding the same. Marvell OCTEON TX2 all 
drivers followed the same convention as well. Just changing to 'Marvell cnxk 
crypto' here might make it stand out.

I don't mind making the change here if you can confirm it's okay.

> 
> > +M: Ankur Dwivedi 
> > +M: Anoob Joseph 
> > +M: Tejasree Kondoj 
> > +F: drivers/crypto/cnxk/
> > +F: doc/guides/cryptodevs/cnxk.rst
> > +F: doc/guides/cryptodevs/features/cn9k.ini
> > +F: doc/guides/cryptodevs/features/cn10k.ini
> > +

Thanks,
Anoob



Re: [dpdk-dev] [PATCH 01/20] crypto/cnxk: add driver skeleton

2021-06-16 Thread Akhil Goyal
> Hi Akhil,
> 
> > >
> > > +Marvell cnxk
> > This should be Marvell cnxk crypto as we have net and event PMD with the
> same
> > name.
> 
> [Anoob] Mempool & event already follows this convention for Marvell cnxk.
> Net driver (which is in pipeline) is also adding the same. Marvell OCTEON TX2
> all drivers followed the same convention as well. Just changing to 'Marvell
> cnxk crypto' here might make it stand out.
> 
> I don't mind making the change here if you can confirm it's okay.

I think there is no convention followed here,
In case of Octeontx2, I see following in MAINTAINERS
Marvell OCTEON TX2 crypto
Marvell OCTEON TX2 regex
Marvell OCTEON TX2   -mempool missing here
Marvell OCTEON TX2 DMA
Marvell OCTEON TX2 EP
Marvell OCTEON TX2    event missing here
Marvell OCTEON TX2    net missing here.

I believe it is better to add crypto here when we have same PMD name for all
Subsystems. It is convenient that way.
@Thomas Monjalon Can you suggest?

Regards,
Akhil


Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Bruce Richardson
On Wed, Jun 16, 2021 at 03:17:51AM +0100, Wang, Haiyue wrote:
> > -Original Message-
> > From: dev  On Behalf Of Chengwen Feng
> > Sent: Tuesday, June 15, 2021 21:22
> > To: tho...@monjalon.net; Yigit, Ferruh 
> > Cc: dev@dpdk.org; nipun.gu...@nxp.com; hemant.agra...@nxp.com; 
> > maxime.coque...@redhat.com;
> > honnappa.nagaraha...@arm.com; jer...@marvell.com; 
> > david.march...@redhat.com; Richardson, Bruce
> > ; jerinjac...@gmail.com
> > Subject: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library
> >
> > This patch introduces 'dmadevice' which is a generic type of DMA
> > device.
> >
> > The APIs of dmadev library exposes some generic operations which can
> > enable configuration and I/O with the DMA devices.
> >
> > Signed-off-by: Chengwen Feng 
> > ---
> >  lib/dmadev/rte_dmadev.h | 531 
> > 
> >  lib/dmadev/rte_dmadev_pmd.h | 384 
> >  2 files changed, 915 insertions(+)
> >  create mode 100644 lib/dmadev/rte_dmadev.h
> >  create mode 100644 lib/dmadev/rte_dmadev_pmd.h
> >
> > diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> > new file mode 100644
> > index 000..ca7c8a8
> > --- /dev/null
> > +++ b/lib/dmadev/rte_dmadev.h
> > @@ -0,0 +1,531 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright 2021 HiSilicon Limited.
> > + */
> > +
> > +#ifndef _RTE_DMADEV_H_
> > +#define _RTE_DMADEV_H_
> > +
> > +/**
> > + * @file rte_dmadev.h
> > + *
> > + * DMA (Direct Memory Access) device APIs.
> > + *
> > + * Defines RTE DMA Device APIs for DMA operations and its provisioning.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Get the total number of DMA devices that have been successfully
> > + * initialised.
> > + *
> > + * @return
> > + *   The total number of usable DMA devices.
> > + */
> > +__rte_experimental
> > +uint16_t
> > +rte_dmadev_count(void);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Get the device identifier for the named DMA device.
> > + *
> > + * @param name
> > + *   DMA device name to select the DMA device identifier.
> > + *
> > + * @return
> > + *   Returns DMA device identifier on success.
> > + *   - <0: Failure to find named DMA device.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_get_dev_id(const char *name);
> > +
> 
> Like 'struct rte_pci_device', 'struct rte_vdev_device', and new introduced
> 'struct rte_auxiliary_device', have the "rte_xxx_device" name style,
> How about 'struct rte_dma_device' name ?

One difference is that the pci, vdev and auxiliary devices are all devices
types on a bus, rather than devices in a functional class like ethdev,
rawdev, eventdev. I think what is here is fine for now - if you feel
strongly we can revisit later.


Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread Wang, Haiyue
> -Original Message-
> From: Richardson, Bruce 
> Sent: Wednesday, June 16, 2021 16:05
> To: Wang, Haiyue 
> Cc: Chengwen Feng ; tho...@monjalon.net; Yigit, 
> Ferruh
> ; dev@dpdk.org; nipun.gu...@nxp.com; 
> hemant.agra...@nxp.com;
> maxime.coque...@redhat.com; honnappa.nagaraha...@arm.com; jer...@marvell.com;
> david.march...@redhat.com; jerinjac...@gmail.com; Xia, Chenbo 
> 
> Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library
> 
> On Wed, Jun 16, 2021 at 03:17:51AM +0100, Wang, Haiyue wrote:
> > > -Original Message-
> > > From: dev  On Behalf Of Chengwen Feng
> > > Sent: Tuesday, June 15, 2021 21:22
> > > To: tho...@monjalon.net; Yigit, Ferruh 
> > > Cc: dev@dpdk.org; nipun.gu...@nxp.com; hemant.agra...@nxp.com; 
> > > maxime.coque...@redhat.com;
> > > honnappa.nagaraha...@arm.com; jer...@marvell.com; 
> > > david.march...@redhat.com; Richardson, Bruce
> > > ; jerinjac...@gmail.com
> > > Subject: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library
> > >
> > > This patch introduces 'dmadevice' which is a generic type of DMA
> > > device.
> > >
> > > The APIs of dmadev library exposes some generic operations which can
> > > enable configuration and I/O with the DMA devices.
> > >
> > > Signed-off-by: Chengwen Feng 
> > > ---
> > >  lib/dmadev/rte_dmadev.h | 531 
> > > 
> > >  lib/dmadev/rte_dmadev_pmd.h | 384 
> > >  2 files changed, 915 insertions(+)
> > >  create mode 100644 lib/dmadev/rte_dmadev.h
> > >  create mode 100644 lib/dmadev/rte_dmadev_pmd.h
> > >
> > > diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> > > new file mode 100644
> > > index 000..ca7c8a8
> > > --- /dev/null
> > > +++ b/lib/dmadev/rte_dmadev.h
> > > @@ -0,0 +1,531 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright 2021 HiSilicon Limited.
> > > + */
> > > +
> > > +#ifndef _RTE_DMADEV_H_
> > > +#define _RTE_DMADEV_H_
> > > +
> > > +/**
> > > + * @file rte_dmadev.h
> > > + *
> > > + * DMA (Direct Memory Access) device APIs.
> > > + *
> > > + * Defines RTE DMA Device APIs for DMA operations and its provisioning.
> > > + */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > + *
> > > + * Get the total number of DMA devices that have been successfully
> > > + * initialised.
> > > + *
> > > + * @return
> > > + *   The total number of usable DMA devices.
> > > + */
> > > +__rte_experimental
> > > +uint16_t
> > > +rte_dmadev_count(void);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > + *
> > > + * Get the device identifier for the named DMA device.
> > > + *
> > > + * @param name
> > > + *   DMA device name to select the DMA device identifier.
> > > + *
> > > + * @return
> > > + *   Returns DMA device identifier on success.
> > > + *   - <0: Failure to find named DMA device.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_dmadev_get_dev_id(const char *name);
> > > +
> >
> > Like 'struct rte_pci_device', 'struct rte_vdev_device', and new introduced
> > 'struct rte_auxiliary_device', have the "rte_xxx_device" name style,
> > How about 'struct rte_dma_device' name ?
> 
> One difference is that the pci, vdev and auxiliary devices are all devices
> types on a bus, rather than devices in a functional class like ethdev,
> rawdev, eventdev. I think what is here is fine for now - if you feel

>From this point of view, yes, it's fine. Thanks, Bruce.

> strongly we can revisit later.


Re: [dpdk-dev] [dpdk-stable] [PATCH] ipc: stop mp control thread on cleanup

2021-06-16 Thread David Marchand
On Mon, Jun 14, 2021 at 11:13 AM David Marchand
 wrote:
>
> When calling rte_eal_cleanup, the mp channel cleanup routine only sets
> mp_fd to -1 leaving the rte_mp_handle control thread running.
> This control thread can spew warnings on reading on an invalid fd.
>
> To handle this situation, sets mp_fd to -1 to signal the control thread
> it should exit, but since this thread might be sleeping on the socket,
> cancel the thread too.
>
> Fixes: 85d6815fa6d0 ("eal: close multi-process socket during cleanup")
> Cc: sta...@dpdk.org
>

Reported-by: Owen Hilyard 

> Signed-off-by: David Marchand 

Anatoly, review please.


-- 
David Marchand



Re: [dpdk-dev] [PATCH v1] net/mlx5: fix IPIP multi tunnel validation

2021-06-16 Thread Matan Azrad



From: Lior Margalit
> A flow rule must not include multiple tunnel layers.
> An attempt to create such a rule, for example:
> testpmd> flow create .../ vxlan / eth / ipv4 proto is 4 / end 
> results in an unclear error.
> 
> In the current implementation there is a check for multiple IPIP tunnels, but
> not for combination of IPIP and a different kind of tunnel, such as VXLAN.
> The fix is to enhance the above check to use MLX5_FLOW_LAYER_TUNNEL
> that consists of all the tunnel masks. The error message will be "multiple
> tunnel not supported".
> 
> Fixes: 5e33bebdd8d3 ("net/mlx5: support IP-in-IP tunnel")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Lior Margalit 
> Acked-by: Ori Kam 
Acked-by: Matan Azrad 


Re: [dpdk-dev] [RFC v2] porting AddressSanitizer feature to DPDK

2021-06-16 Thread Lin, Xueqin
> -Original Message-
> From: Jerin Jacob 
> Sent: Tuesday, June 15, 2021 4:40 PM
> To: Peng, ZhihongX 
> Cc: Burakov, Anatoly ; Ananyev, Konstantin
> ; Stephen Hemminger
> ; dpdk-dev ; Lin, Xueqin
> 
> Subject: Re: [dpdk-dev] [RFC v2] porting AddressSanitizer feature to DPDK
> 
> On Tue, Jun 15, 2021 at 1:46 PM  wrote:
> >
> > From: Zhihong Peng 
> >
> > AddressSanitizer (ASan) is a google memory error detect standard tool.
> > It could help to detect use-after-free and {heap,stack,global}-buffer
> > overflow bugs in C/C++ programs, print detailed error information when
> > error happens, large improve debug efficiency.
> >
> > By referring to its implementation algorithm
> > (https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm),
> > ported heap-buffer-overflow and use-after-freefunctions to dpdk.
> >
> > Here is an example of heap-buffer-overflow bug:
> > ..
> > char *p = rte_zmalloc(NULL, 7, 0);
> > p[7] = 'a';
> > ..
> >
> > Here is an example of use-after-free bug:
> > ..
> > char *p = rte_zmalloc(NULL, 7, 0);
> > rte_free(p);
> > *p = 'a';
> > ..
> >
> > If you want to use this feature,
> > you need to use the following compilation options:
> > -Db_lundef=false -Db_sanitize=address
> 
> # Thanks for this patch. It is a useful item.
> 
> # Subject could be changed
> from:
> porting AddressSanitizer feature to DPDK to
> eal: support for  AddressSanitizer
> or so

Thanks for your positive feedback and review.
Good point, we will update the title in next version.

> 
> # Could you add a section in the documentation for Sanitizers to document the
> build time option and other points that users need to know.

Make sense to add build option and key points to document, will add this part 
in doc folder.

> We can add other sanitizers such as UBSan etc in the future here
WIP to research other sanitizer tool. 
> 
> # Add a UT test case to make sure it is working in app/test or so.

This tool could help to detect memory issue, need to change bad code to check 
if working. 
Suggest listing demo code and tool capture information for user to try if tool 
works, also add this part into doc.

> 
> # Also, Please update the release note for this feature.
Sure, we can update the release note if code merge. 



Re: [dpdk-dev] Memory leak in rte_pci_scan

2021-06-16 Thread David Marchand
On Tue, Jun 15, 2021 at 5:16 PM Owen Hilyard  wrote:
>
> The issue may have been the interactive docker session I was running it in. 
> The last few tests (150-157) were all taking until the timeout the lab uses 
> for unit tests (2 hours since the timeout was multiplied by 10). I had to 
> leave for the day so I restarted it in a non-interactive container and it ran 
> in 2 hours. If we were to just run the fast-tests suite, then it would have 
> taken 42 minutes to run. This is mostly due to timeouts in 
> eal_flags_c_opt_autotest, eal_flags_hpet_autotest, eal_flags_misc_autotest 
> and multiprocess_autotest, each taking 600 seconds. Finding out what caused 
> these to stall would bring the runtime down to 3 minutes. All of the failures 
> should be ASAN-related.

- The perf-tests testsuite is heavy and is probably not suited for per
patchset runs.
It may be worth running every once in a while but at the project/main
branches level.


- For the fast-tests testsuite, the default timeout should be 10s, not 600s.
See timeout_seconds_fast,
https://git.dpdk.org/dpdk/tree/app/test/meson.build#n446
Odd that a 600s timeout has been applied to fast-tests in your run.
How do you invoke meson?


- There is an interesting trace that shows issues with starting
secondary processes in this environment.
EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Detected static linkage of DPDK
EAL: Multi-process socket
/var/run/dpdk/eal_flags_c_opt_autotest/mp_socket_264_50a7b93b648fa
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Cannot get a virtual area at requested address: 0x7f1a8f40
(got 0x7f19dc1fe000)
EAL: Cannot reserve 17179869184 bytes at [0x7f1a8f40] - please use
'--base-virtaddr' option
EAL: Cannot preallocate VA space for hugepage memory
EAL: FATAL: Cannot init memory
EAL: Cannot init memory
EAL: Cannot destroy local memory map
EAL: Could not release memory subsystem data

It seems like there are multiple dpdk processes running in // in this
environment.
Any idea of what is happening on your system at the moment you tried
to run this test?


-- 
David Marchand



Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread fengchengwen
On 2021/6/16 0:38, Bruce Richardson wrote:
> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
>> This patch introduces 'dmadevice' which is a generic type of DMA
>> device.
>>
>> The APIs of dmadev library exposes some generic operations which can
>> enable configuration and I/O with the DMA devices.
>>
>> Signed-off-by: Chengwen Feng 
>> ---
> Thanks for sending this.
> 
> Of most interest to me right now are the key data-plane APIs. While we are
> still in the prototyping phase, below is a draft of what we are thinking
> for the key enqueue/perform_ops/completed_ops APIs.
> 
> Some key differences I note in below vs your original RFC:
> * Use of void pointers rather than iova addresses. While using iova's makes
>   sense in the general case when using hardware, in that it can work with
>   both physical addresses and virtual addresses, if we change the APIs to use
>   void pointers instead it will still work for DPDK in VA mode, while at the
>   same time allow use of software fallbacks in error cases, and also a stub
>   driver than uses memcpy in the background. Finally, using iova's makes the
>   APIs a lot more awkward to use with anything but mbufs or similar buffers
>   where we already have a pre-computed physical address.

The iova is an hint to application, and widely used in DPDK.
If switch to void, how to pass the address (iova or just va ?)
this may introduce implementation dependencies here.

Or always pass the va, and the driver performs address translation, and this
translation may cost too much cpu I think.

> * Use of id values rather than user-provided handles. Allowing the user/app
>   to manage the amount of data stored per operation is a better solution, I
>   feel than proscribing a certain about of in-driver tracking. Some apps may
>   not care about anything other than a job being completed, while other apps
>   may have significant metadata to be tracked. Taking the user-context
>   handles out of the API also makes the driver code simpler.

The user-provided handle was mainly used to simply application implementation,
It provides the ability to quickly locate contexts.

The "use of id values" seem like the dma_cookie of Linux DMA engine framework,
user will get a unique dma_cookie after calling dmaengine_submit(), and then
could use it to call dma_async_is_tx_complete() to get completion status.

How about define the copy prototype as following:
  dma_cookie_t rte_dmadev_copy(uint16_t dev_id, xxx)
while the dma_cookie_t is int32 and is monotonically increasing, when >=0 mean
enqueue successful else fail.
when complete the dmadev will return latest completed dma_cookie, and the
application could use the dma_cookie to quick locate contexts.

> * I've kept a single combined API for completions, which differs from the
>   separate error handling completion API you propose. I need to give the
>   two function approach a bit of thought, but likely both could work. If we
>   (likely) never expect failed ops, then the specifics of error handling
>   should not matter that much.

The rte_ioat_completed_ops API is too complex, and consider some applications
may never copy fail, so split them as two API.
It's indeed not friendly to other scenarios that always require error handling.

I prefer use completed operations number as return value other than the ID so
that application could simple judge whether have new completed operations, and
the new prototype:
 uint16_t rte_dmadev_completed(uint16_t dev_id, dma_cookie_t *cookie, uint32_t 
*status, uint16_t max_status, uint16_t *num_fails);

1) for normal case which never expect failed ops:
   just call: ret = rte_dmadev_completed(dev_id, &cookie, NULL, 0, NULL);
2) for other case:
   ret = rte_dmadev_completed(dev_id, &cookie, &status, max_status, &fails);
   at this point the fails <= ret <= max_status

> 
> For the rest, the control / setup APIs are likely to be rather
> uncontroversial, I suspect. However, I think that rather than xstats APIs,
> the library should first provide a set of standardized stats like ethdev
> does. If driver-specific stats are needed, we can add xstats later to the
> API.

Agree, will fix in v2

> 
> Appreciate your further thoughts on this, thanks.
> 
> Regards,
> /Bruce
> 
> /**
>  * @warning
>  * @b EXPERIMENTAL: this API may change without prior notice.
>  *
>  * Enqueue a copy operation onto the DMA device
>  *
>  * This queues up a copy operation to be performed by hardware, but does not
>  * trigger hardware to begin that operation.
>  *
>  * @param dev_id
>  *   The dmadev device id of the DMA instance
>  * @param src
>  *   The source buffer
>  * @param dst
>  *   The destination buffer
>  * @param length
>  *   The length of the data to be copied
>  * @return
>  *   - On success, id (uint16_t) of job enqueued
>  *   - On failure, negative error code
>  */
> static inline int
> __rte_experimental
> rte_dmadev_enqueue_copy(uint16_t dev_id, void * src, void * dst, unsigned int 
> length);
> 
>

Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-16 Thread Jerin Jacob
On Tue, Jun 15, 2021 at 12:18 PM Thomas Monjalon  wrote:
>
> 14/06/2021 17:48, Morten Brørup:
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon
> > It would be much simpler to just increase RTE_MAX_ETHPORTS to something big 
> > enough to hold a sufficiently large array. And possibly add an 
> > rte_max_ethports variable to indicate the number of populated entries in 
> > the array, for use when iterating over the array.
> >
> > Can we come up with another example than RTE_MAX_ETHPORTS where this 
> > library provides a better benefit?
>
> What is big enough?
> Is 640KB enough for RAM? ;)

If I understand it correctly, Linux process allocates 640KB due to
that fact currently
struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] is global and it
is from BSS.

If we make this from heap i.e use malloc() to allocate this memory
then in my understanding Linux
really won't allocate the real page for backend memory until unless,
someone write/read to this memory.

i.e it will be free virtual memory using Linux memory management help.
If so, we can keep large values for RTE_MAX_ETHPORTS
without wasting any "real" memory even though the system has a few ports.

Thoughts?



>
> When dealing with microservices switching, the numbers can increase very fast.
>
>


Re: [dpdk-dev] [PATCH 04/20] crypto/cnxk: add symmetric crypto capabilities

2021-06-16 Thread Akhil Goyal
> Subject: [PATCH 04/20] crypto/cnxk: add symmetric crypto capabilities
> 
> From: Ankur Dwivedi 
> 
> Add symmetric crypto capabilities for cn9k & cn10k.
> 

Capability patch can also be added in the end along with documentation
update in the .ini files after the data path is added.


> +++ b/drivers/crypto/cnxk/cnxk_cryptodev_capabilities.c
> @@ -0,0 +1,755 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2021 Marvell.
> + */
> +
> +#include 
> +
> +#include "roc_api.h"
> +
> +#include "cnxk_cryptodev.h"
> +#include "cnxk_cryptodev_capabilities.h"
> +
> +#define CPT_CAPS_ADD(cnxk_caps, cur_pos, hw_caps, name)  
>   \
> + do {   \
> + if ((hw_caps[CPT_ENG_TYPE_SE].name) || \
> + (hw_caps[CPT_ENG_TYPE_IE].name) || \
> + (hw_caps[CPT_ENG_TYPE_AE].name))   \
> + cpt_caps_add(cnxk_caps, cur_pos, caps_##name,
> \
> +  RTE_DIM(caps_##name));\
> + } while (0)
> +
> +static const struct rte_cryptodev_capabilities caps_mul[] = {
> + {   /* RSA */
> + .op = RTE_CRYPTO_OP_TYPE_ASYMMETRIC,

Patch description says sym capabilities are added, but these are asym.



[dpdk-dev] [PATCH v2] bus: clarify log for non-NUMA-aware devices

2021-06-16 Thread Dmitry Kozlyuk
PCI and vmbus drivers printed a warning
when NUMA node had beed reported as (-1) or not reported by OS:

EAL:   Invalid NUMA socket, default to 0

This message and its level might confuse users, because configuration
is valid and nothing happens that requires attention or intervention.

Reduce level to INFO, reword the message, and suppress it when there is
only one NUMA node, bacause NUMA-awareness does not matter in this case.

Fixes: f0e0e86aa35d ("pci: move NUMA node check from scan to probe")
Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
Cc: sta...@dpdk.org

Signed-off-by: Dmitry Kozlyuk 
Reviewed-by: Viacheslav Ovsiienko 
Reviewed-by: Xueming Li 
---
v2: Add NUMA node count check (Stephen Hemminger).

 doc/guides/nics/ena.rst  | 2 +-
 drivers/bus/pci/pci_common.c | 4 ++--
 drivers/bus/vmbus/vmbus_common.c | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index 0f1f63f722..694ce1da74 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -234,7 +234,7 @@ Example output:
 
[...]
EAL: PCI device :00:06.0 on NUMA socket -1
-   EAL:   Invalid NUMA socket, default to 0
+   EAL:   Device is not NUMA-aware, defaulting socket to 0
EAL:   probe driver: 1d0f:ec20 net_ena
 
Interactive-mode selected
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 35d7d092d1..0bb56d9b7f 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -189,8 +189,8 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
return 1;
}
 
-   if (dev->device.numa_node < 0) {
-   RTE_LOG(WARNING, EAL, "  Invalid NUMA socket, default to 0\n");
+   if (rte_socket_count() > 1 && dev->device.numa_node < 0) {
+   RTE_LOG(INFO, EAL, "  Device is not NUMA-aware, defaulting 
socket to 0\n");
dev->device.numa_node = 0;
}
 
diff --git a/drivers/bus/vmbus/vmbus_common.c b/drivers/bus/vmbus/vmbus_common.c
index d25fd14ef5..5b654b0289 100644
--- a/drivers/bus/vmbus/vmbus_common.c
+++ b/drivers/bus/vmbus/vmbus_common.c
@@ -111,8 +111,8 @@ vmbus_probe_one_driver(struct rte_vmbus_driver *dr,
/* reference driver structure */
dev->driver = dr;
 
-   if (dev->device.numa_node < 0) {
-   VMBUS_LOG(WARNING, "  Invalid NUMA socket, default to 0");
+   if (rte_socket_count() > 1 && dev->device.numa_node < 0) {
+   VMBUS_LOG(INFO, "  Device is not NUMA-aware, defaulting socket 
to 0\n");
dev->device.numa_node = 0;
}
 
-- 
2.18.2



Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-16 Thread fengchengwen
On 2021/6/16 15:09, Morten Brørup wrote:
>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
>> Sent: Tuesday, 15 June 2021 18.39
>>
>> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
>>> This patch introduces 'dmadevice' which is a generic type of DMA
>>> device.
>>>
>>> The APIs of dmadev library exposes some generic operations which can
>>> enable configuration and I/O with the DMA devices.
>>>
>>> Signed-off-by: Chengwen Feng 
>>> ---
>> Thanks for sending this.
>>
>> Of most interest to me right now are the key data-plane APIs. While we
>> are
>> still in the prototyping phase, below is a draft of what we are
>> thinking
>> for the key enqueue/perform_ops/completed_ops APIs.
>>
>> Some key differences I note in below vs your original RFC:
>> * Use of void pointers rather than iova addresses. While using iova's
>> makes
>>   sense in the general case when using hardware, in that it can work
>> with
>>   both physical addresses and virtual addresses, if we change the APIs
>> to use
>>   void pointers instead it will still work for DPDK in VA mode, while
>> at the
>>   same time allow use of software fallbacks in error cases, and also a
>> stub
>>   driver than uses memcpy in the background. Finally, using iova's
>> makes the
>>   APIs a lot more awkward to use with anything but mbufs or similar
>> buffers
>>   where we already have a pre-computed physical address.
>> * Use of id values rather than user-provided handles. Allowing the
>> user/app
>>   to manage the amount of data stored per operation is a better
>> solution, I
>>   feel than proscribing a certain about of in-driver tracking. Some
>> apps may
>>   not care about anything other than a job being completed, while other
>> apps
>>   may have significant metadata to be tracked. Taking the user-context
>>   handles out of the API also makes the driver code simpler.
>> * I've kept a single combined API for completions, which differs from
>> the
>>   separate error handling completion API you propose. I need to give
>> the
>>   two function approach a bit of thought, but likely both could work.
>> If we
>>   (likely) never expect failed ops, then the specifics of error
>> handling
>>   should not matter that much.
>>
>> For the rest, the control / setup APIs are likely to be rather
>> uncontroversial, I suspect. However, I think that rather than xstats
>> APIs,
>> the library should first provide a set of standardized stats like
>> ethdev
>> does. If driver-specific stats are needed, we can add xstats later to
>> the
>> API.
>>
>> Appreciate your further thoughts on this, thanks.
>>
>> Regards,
>> /Bruce
> 
> I generally agree with Bruce's points above.
> 
> I would like to share a couple of ideas for further discussion:
> 
> 1. API for bulk operations.
> The ability to prepare a vector of DMA operations, and then post it to the 
> DMA driver.

We consider bulk operation and final decide not to support:
1. The DMA engine don't applicable to small-packet scenarios which have high 
PPS.
   PS: The vector is suitable for high PPS.
2. To support post bulk ops, we need define standard struct like rte_mbuf, and
   application may nned init the struct field and pass them as pointer array,
   this may cost too much CPU.
3. The post request was simple than process completed operations, The CPU write
   performance is also good. ---driver could use vectors to accelerate the 
process
   of completed operations.

> 
> 2. Prepare the API for more complex DMA operations than just copy/fill.
> E.g. blitter operations like "copy A bytes from the source starting at 
> address X, to the destination starting at address Y, masked with the bytes 
> starting at address Z, then skip B bytes at the source and C bytes at the 
> destination, rewind the mask to the beginning of Z, and repeat D times". This 
> is just an example.
> I'm suggesting to use a "DMA operation" union structure as parameter to the 
> command enqueue function, rather than having individual functions for each 
> possible DMA operation.

There are many sisution which may hard to define such structure, I prefer 
separates API like copy/fill/...
PS: I saw struct dma_device (Linux dmaengine.h) also support various prep_xxx 
API.

> I know I'm not the only one old enough on the mailing list to have worked 
> with the Commodore Amiga's blitter. :-)
> DPDK has lots of code using CPU vector instructions to shuffle bytes around. 
> I can easily imagine a DMA engine doing similar jobs, possibly implemented in 
> an FPGA or some other coprocessor.
> 
> -Morten
> 
> 
> .
>