Re: [dpdk-dev] [PATCH] lpm6: Fix missing ^ in documentation.

2021-09-14 Thread David Marchand
On Mon, Sep 13, 2021 at 8:47 PM Ben Pfaff  wrote:
>

This is probably due to conversion from ms word to rst format.

Fixes: fc1f2750a3ec ("doc: programmers guide")
Cc: sta...@dpdk.org

> Signed-off-by: Ben Pfaff 

Reviewed-by: David Marchand 


-- 
David Marchand



Re: [dpdk-dev] [PATCH] efd: change data type of parameter

2021-09-14 Thread David Marchand
On Fri, Sep 10, 2021 at 6:54 PM Pablo de Lara
 wrote:
>
> rte_efd_create() function was using uint8_t for a socket bitmask,
> for one of its parameters.
> This limits the maximum of NUMA sockets to be 8.
> Changing to to uint64_t increases it to 64, which should be
> more future-proof.

Cc: ppc maintainer, since I think powerX servers have non contiguous
NUMA sockets.


>
> Coverity issue: 366390
> Fixes: 56b6ef874f8 ("efd: new Elastic Flow Distributor library")
>
> Signed-off-by: Pablo de Lara 
> ---
>
> This fix requires an API breakage and therefore it is not
> a good candidate for backporting (besides, it is a very low impact bug).
> Hence, I am not CC'ing stable.

This is an unannounced breakage for a stable API.
Cc: techboard + Ray for awareness.


>
> ---
>
>  lib/efd/rte_efd.c | 2 +-
>  lib/efd/rte_efd.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/efd/rte_efd.c b/lib/efd/rte_efd.c
> index 77f46809f8..68a2378e88 100644
> --- a/lib/efd/rte_efd.c
> +++ b/lib/efd/rte_efd.c
> @@ -495,7 +495,7 @@ efd_search_hash(struct rte_efd_table * const table,
>
>  struct rte_efd_table *
>  rte_efd_create(const char *name, uint32_t max_num_rules, uint32_t key_len,
> -   uint8_t online_cpu_socket_bitmask, uint8_t offline_cpu_socket)
> +   uint64_t online_cpu_socket_bitmask, uint8_t 
> offline_cpu_socket)
>  {
> struct rte_efd_table *table = NULL;
> uint8_t *key_array = NULL;
> diff --git a/lib/efd/rte_efd.h b/lib/efd/rte_efd.h
> index c2be4c09ae..d3d7befd0c 100644
> --- a/lib/efd/rte_efd.h
> +++ b/lib/efd/rte_efd.h
> @@ -139,7 +139,7 @@ typedef uint16_t efd_hashfunc_t;
>   */
>  struct rte_efd_table *
>  rte_efd_create(const char *name, uint32_t max_num_rules, uint32_t key_len,
> -   uint8_t online_cpu_socket_bitmask, uint8_t offline_cpu_socket);
> +   uint64_t online_cpu_socket_bitmask, uint8_t offline_cpu_socket);
>
>  /**
>   * Releases the resources from an EFD table
> --
> 2.25.1
>


-- 
David Marchand



[dpdk-dev] [PATCH v1] eventdev: update rx timestamp in mbuf using mbuf dynamic field

2021-09-14 Thread Ganapati Kundapura
Add support to register timestamp dynamic field in mbuf.

Update the timestamp in mbuf for each packet before enqueuing
to event device if the timestamp is not already set.

Adding the timestamp in Rx adapter avoids additional latency
due to the event device.

Signed-off-by: Ganapati Kundapura 
---
 lib/eventdev/rte_event_eth_rx_adapter.c | 35 +
 1 file changed, 35 insertions(+)

diff --git a/lib/eventdev/rte_event_eth_rx_adapter.c 
b/lib/eventdev/rte_event_eth_rx_adapter.c
index de8ab05..9cb2550 100644
--- a/lib/eventdev/rte_event_eth_rx_adapter.c
+++ b/lib/eventdev/rte_event_eth_rx_adapter.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "rte_eventdev.h"
 #include "eventdev_pmd.h"
@@ -240,6 +241,17 @@ struct eth_rx_queue_info {
 
 static struct rte_event_eth_rx_adapter **event_eth_rx_adapter;
 
+/* Enable dynamic timestamp field in mbuf */
+uint64_t event_eth_rx_timestamp_dynflag;
+int event_eth_rx_timestamp_dynfield_offset = -1;
+
+static inline rte_mbuf_timestamp_t *
+rte_event_eth_rx_timestamp_dynfield(struct rte_mbuf *mbuf)
+{
+   return RTE_MBUF_DYNFIELD(mbuf,
+   event_eth_rx_timestamp_dynfield_offset, rte_mbuf_timestamp_t *);
+}
+
 static inline int
 rxa_validate_id(uint8_t id)
 {
@@ -890,8 +902,18 @@ rxa_buffer_mbufs(struct rte_event_eth_rx_adapter 
*rx_adapter,
int do_rss;
uint16_t nb_cb;
uint16_t dropped;
+   uint64_t ts, ts_mask;
 
if (!eth_rx_queue_info->ena_vector) {
+   ts = m->ol_flags & event_eth_rx_timestamp_dynflag ?
+   0 : rte_get_tsc_cycles();
+
+   /* 0x    if PKT_RX_TIMESTAMP is set,
+* otherwise 0
+*/
+   ts_mask = (uint64_t)(!(m->ol_flags &
+  event_eth_rx_timestamp_dynflag)) - 1ULL;
+
/* 0x  if PKT_RX_RSS_HASH is set, otherwise 0 */
rss_mask = ~(((m->ol_flags & PKT_RX_RSS_HASH) != 0) - 1);
do_rss = !rss_mask && !eth_rx_queue_info->flow_id_mask;
@@ -899,6 +921,11 @@ rxa_buffer_mbufs(struct rte_event_eth_rx_adapter 
*rx_adapter,
struct rte_event *ev;
 
m = mbufs[i];
+   *rte_event_eth_rx_timestamp_dynfield(m) =
+   ts |
+   (*rte_event_eth_rx_timestamp_dynfield(m) &
+   ts_mask);
+
ev = &buf->events[new_tail];
 
rss = do_rss ? rxa_do_softrss(m, rx_adapter->rss_key_be)
@@ -2256,6 +2283,14 @@ rte_event_eth_rx_adapter_create_ext(uint8_t id, uint8_t 
dev_id,
event_eth_rx_adapter[id] = rx_adapter;
if (conf_cb == rxa_default_conf_cb)
rx_adapter->default_cb_arg = 1;
+
+   if (rte_mbuf_dyn_rx_timestamp_register(
+   &event_eth_rx_timestamp_dynfield_offset,
+   &event_eth_rx_timestamp_dynflag) != 0) {
+   RTE_EDEV_LOG_ERR("Error registering timestamp field in mbuf\n");
+   return -rte_errno;
+   }
+
rte_eventdev_trace_eth_rx_adapter_create(id, dev_id, conf_cb,
conf_arg);
return 0;
-- 
2.6.4



Re: [dpdk-dev] [PATCH 2/3] app/testpmd: fix RSS hash type update

2021-09-14 Thread Nélio Laranjeiro
+Shahaf,

Hi Maxime,

On Mon, Sep 13, 2021 at 11:41:04AM +0200, Maxime Coquelin wrote:
> Hi Nélio,
> 
> On 9/10/21 4:16 PM, Nélio Laranjeiro wrote:
> > On Fri, Sep 10, 2021 at 01:06:53PM +0300, Andrew Rybchenko wrote:
> >> On 9/10/21 12:57 PM, Maxime Coquelin wrote:
> >>>
> >>>
> >>> On 9/10/21 11:51 AM, Andrew Rybchenko wrote:
>  On 9/10/21 12:17 PM, Maxime Coquelin wrote:
> > port_rss_hash_key_update() initializes rss_conf with the
> > RSS hash type and key provided by the user, but it calls
> > rte_eth_dev_rss_hash_conf_get() before calling
> > rte_eth_dev_rss_hash_update(), which overides the parsed
> > config with current NIC's config.
> >
> > While the RSS key value is set again after, this is not
> > the case of the key length and the type of hash.
> >
> > There is no need to read the RSS config from the NIC, let's
> > just try to set the user defined one.
> >
> > Fixes: 8205e241b2b0 ("app/testpmd: add missing type to RSS hash 
> > commands")
> > Cc: sta...@dpdk.org
> > Cc: nelio.laranje...@6wind.com
> >
> > Signed-off-by: Maxime Coquelin 
> > ---
> >  app/test-pmd/config.c | 8 ++--
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> > index 31d8ba1b91..451bda53b1 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -2853,18 +2853,14 @@ port_rss_hash_key_update(portid_t port_id, char 
> > rss_type[], uint8_t *hash_key,
> > int diag;
> > unsigned int i;
> >  
> > -   rss_conf.rss_key = NULL;
> > +   rss_conf.rss_key = hash_key;
> > rss_conf.rss_key_len = hash_key_len;
> > rss_conf.rss_hf = 0;
> > for (i = 0; rss_type_table[i].str; i++) {
> > if (!strcmp(rss_type_table[i].str, rss_type))
> > rss_conf.rss_hf = rss_type_table[i].rss_type;
> > }
> > -   diag = rte_eth_dev_rss_hash_conf_get(port_id, &rss_conf);
> > -   if (diag == 0) {
> > -   rss_conf.rss_key = hash_key;
> > -   diag = rte_eth_dev_rss_hash_update(port_id, &rss_conf);
> > -   }
> > +   diag = rte_eth_dev_rss_hash_update(port_id, &rss_conf);
> 
>  I'm not 100% sure, but I'd say the intent above could be
>  to update key only as the function name says. I.e. keep
>  rss_hf as is. That could be the reason to get first.
> > 
> > True,
> > 
> >>> I think that was the intial purpose of the command, but patch
> >>> 8205e241b2b0 added setting the hash type as mandatory. There are
> >>> no other command to configure the hash type from testpmd AFAICT.
> > 
> > Also for the same initial purpose, some NIC have an hash key per
> > protocol, by default it uses the same key for all of them but it can be
> > configured individually making for example key0 for all protocols expect
> > IPv4 which uses key1.
> 
> Thanks for the info, I have looked at most drivers but didn't found one
> that support this feature, could you give some pointer?

Well, I have done the modification at that time for MLX5 PMD, since I
left DPDK in 2018 I don't know if they still support such configuration
from this API or if they fully moved to rte_flow.

> Given how the drivers implément the callback, do you agree with the fix,
> or do you have something else in mind?

I cannot answer if this get() is mandatory, this predates my arrival on
DPDK (original commit written in 2014), looking at DPDK state on 
 commit f79959ea1504 ("app/testpmd: allow to configure RSS hash key").
Maybe someone from Intel can help, eventually you can contact PMD
maintainers to review this patch.

Regards,
Nélio

> Thanks,
> Maxime
> 
> >>> Also, even without 8205e241b2b0, the function was broken because the
> >>> key length was overiden.
> >>
> >> I see, many thanks for explanations.
> > 
> 

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH] net/virtio: remove handling of zero desc number on RxQ setup

2021-09-14 Thread Maxime Coquelin



On 9/14/21 8:40 AM, Andrew Rybchenko wrote:
> On 9/13/21 10:25 PM, Maxime Coquelin wrote:
>>
>>
>> On 8/20/21 2:47 PM, Andrew Rybchenko wrote:
>>> From: Ivan Ilchenko 
>>>
>>> Rx queue setup callback allows to use the whole ring when
>>> descriptor number argument equals zero. There's no point to
>>> handle zero in any way since RTE Rx queue setup function
>>> rte_eth_rx_queue_setup() doesn't pass zero using fallback
>>> values.
>>>
>>> Fixes: 3be82f5cc5e3 ("ethdev: support PMD-tuned Tx/Rx parameters")
>>> Cc: sta...@dpdk.org
>>>
>>> Signed-off-by: Ivan Ilchenko 
>>> Signed-off-by: Andrew Rybchenko 
>>> ---
>>>  drivers/net/virtio/virtio_rxtx.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/virtio/virtio_rxtx.c 
>>> b/drivers/net/virtio/virtio_rxtx.c
>>> index 8a48fba5cc..18f03c9fc9 100644
>>> --- a/drivers/net/virtio/virtio_rxtx.c
>>> +++ b/drivers/net/virtio/virtio_rxtx.c
>>> @@ -706,7 +706,7 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
>>> }
>>> vq->vq_free_thresh = rx_free_thresh;
>>>  
>>> -   if (nb_desc == 0 || nb_desc > vq->vq_nentries)
>>> +   if (nb_desc > vq->vq_nentries)
>>> nb_desc = vq->vq_nentries;
>>> vq->vq_free_cnt = RTE_MIN(vq->vq_free_cnt, nb_desc);
>>>  
>>>
>>
>> Is that really a fix?
>> I see it more like an optimization in a cold path, so maybe it is not
>> worth backporting?
> 
> The main idea is not an optimization, but simplification of
> the code to make it easier to understand. Less special
> cases is better.
> 
> I agree that it does not make sense to backport it.

Ok, thanks. I'll will remove the Fixes tag while applying, no need to
resubmit.

Maxime
> 
>> Other than that:
>> Reviewed-by: Maxime Coquelin 
> 
> Thanks,
> Andrew.
> 



Re: [dpdk-dev] [PATCH] net/virtio: report max/min/align desc limits in dev info get

2021-09-14 Thread Maxime Coquelin



On 9/14/21 8:43 AM, Andrew Rybchenko wrote:
> On 9/13/21 10:52 PM, Maxime Coquelin wrote:
>>
>>
>> On 8/20/21 2:48 PM, Andrew Rybchenko wrote:
>>> From: Ivan Ilchenko 
>>>
>>> Report max/min/align descriptors limits in device info get callback.
>>> Before calling the callback, rte_eth_dev_info_get() provides
>>> default values of nb_min as zero and nb_max as UINT16_MAX that are
>>> not correct for the driver, so one can't rely on them.
>>>
>>> Signed-off-by: Ivan Ilchenko 
>>> Signed-off-by: Andrew Rybchenko 
>>> ---
>>>  drivers/net/virtio/virtio_ethdev.c | 25 +
>>>  1 file changed, 25 insertions(+)
>>>
>>> diff --git a/drivers/net/virtio/virtio_ethdev.c 
>>> b/drivers/net/virtio/virtio_ethdev.c
>>> index e58085a2c9..601c03e079 100644
>>> --- a/drivers/net/virtio/virtio_ethdev.c
>>> +++ b/drivers/net/virtio/virtio_ethdev.c
>>> @@ -33,6 +33,7 @@
>>>  #include "virtio_logs.h"
>>>  #include "virtqueue.h"
>>>  #include "virtio_rxtx.h"
>>> +#include "virtio_rxtx_simple.h"
>>>  #include "virtio_user/virtio_user_dev.h"
>>>  
>>>  static int  virtio_dev_configure(struct rte_eth_dev *dev);
>>> @@ -2536,6 +2537,30 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct 
>>> rte_eth_dev_info *dev_info)
>>> if ((host_features & tso_mask) == tso_mask)
>>> dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;
>>>  
>>> +   if (host_features & (1ULL << VIRTIO_F_RING_PACKED)) {
>>> +   /*
>>> +* According to 2.7 Packed Virtqueues,
>>> +* 2.7.10.1 Structure Size and Alignment:
>>> +* The Queue Size value does not have to be a power of 2.
>>> +*/
>>> +   dev_info->rx_desc_lim.nb_max = UINT16_MAX;
>>> +   } else {
>>> +   /*
>>> +* According to 2.6 Split Virtqueues:
>>> +* Queue Size value is always a power of 2. The maximum Queue
>>> +* Size value is 32768.
>>> +*/
>>> +   dev_info->rx_desc_lim.nb_max = 32768;
>>> +   }
>>> +   /*
>>> +* Actual minimum is not the same for virtqueues of different kinds,
>>> +* but to avoid tangling the code with separate branches, rely on
>>> +* default thresholds since desc number must be at least of their size.
>>> +*/
>>> +   dev_info->rx_desc_lim.nb_min = RTE_MAX(DEFAULT_RX_FREE_THRESH,
>>> +  RTE_VIRTIO_VPMD_RX_REARM_THRESH);
>>> +   dev_info->rx_desc_lim.nb_align = 1;
>>> +
>>
>> It makes sense, but shouldn't we do the same for dev_info->tx_desc_lim?
> 
> Yes, you're right. We'll care about it.
> 
> I suggest to add "Rx" in the summary and care about Tx in a
> subsequent patch.
> 
> net/virtio: report max/min/align Rx desc limits in dev info
> 
> OK?
> 

Works for me! I will fix the title while applying.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH] net/virtio: remove handling of zero desc number on RxQ setup

2021-09-14 Thread Andrew Rybchenko
On 9/14/21 10:26 AM, Maxime Coquelin wrote:
> 
> 
> On 9/14/21 8:40 AM, Andrew Rybchenko wrote:
>> On 9/13/21 10:25 PM, Maxime Coquelin wrote:
>>>
>>>
>>> On 8/20/21 2:47 PM, Andrew Rybchenko wrote:
 From: Ivan Ilchenko 

 Rx queue setup callback allows to use the whole ring when
 descriptor number argument equals zero. There's no point to
 handle zero in any way since RTE Rx queue setup function
 rte_eth_rx_queue_setup() doesn't pass zero using fallback
 values.

 Fixes: 3be82f5cc5e3 ("ethdev: support PMD-tuned Tx/Rx parameters")
 Cc: sta...@dpdk.org

 Signed-off-by: Ivan Ilchenko 
 Signed-off-by: Andrew Rybchenko 
 ---
  drivers/net/virtio/virtio_rxtx.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/drivers/net/virtio/virtio_rxtx.c 
 b/drivers/net/virtio/virtio_rxtx.c
 index 8a48fba5cc..18f03c9fc9 100644
 --- a/drivers/net/virtio/virtio_rxtx.c
 +++ b/drivers/net/virtio/virtio_rxtx.c
 @@ -706,7 +706,7 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
}
vq->vq_free_thresh = rx_free_thresh;
  
 -  if (nb_desc == 0 || nb_desc > vq->vq_nentries)
 +  if (nb_desc > vq->vq_nentries)
nb_desc = vq->vq_nentries;
vq->vq_free_cnt = RTE_MIN(vq->vq_free_cnt, nb_desc);
  

>>>
>>> Is that really a fix?
>>> I see it more like an optimization in a cold path, so maybe it is not
>>> worth backporting?
>>
>> The main idea is not an optimization, but simplification of
>> the code to make it easier to understand. Less special
>> cases is better.
>>
>> I agree that it does not make sense to backport it.
> 
> Ok, thanks. I'll will remove the Fixes tag while applying, no need to
> resubmit.

Thanks,
Andrew.

> Maxime
>>
>>> Other than that:
>>> Reviewed-by: Maxime Coquelin 
>>
>> Thanks,
>> Andrew.
>>



Re: [dpdk-dev] [PATCH v8] doc: add release milestones definition

2021-09-14 Thread Thomas Monjalon
03/09/2021 17:35, Ferruh Yigit:
> On 9/3/2021 12:50 PM, Thomas Monjalon wrote:
> > 02/09/2021 18:33, Ferruh Yigit:
> >> On 8/26/2021 11:11 AM, Thomas Monjalon wrote:
> >>> +* Any issue found in -rc1 should be fixed.
> >>> +
> >>> +rc3
> >>> +~~~
> >>> +
> >>> +* Priority: applications. No application feature should be accepted 
> >>> after -rc3.
> >>> +* New functionality that does not depend on libraries update
> >>> +  can be integrated as part of -rc3.
> >>> +* The application change must include documentation in the relevant .rst 
> >>> files
> >>> +  (application-specific and release notes if significant).
> >>> +* Libraries and drivers cleanup are allowed.
> >>> +* Small driver reworks.
> >>> +* Critical and minor bug fixes.
> >>
> >> As mentioned before, my concern is this may create false impression that 
> >> bugs
> >> are fixed only in this phase. What about remove this line completely and 
> >> update
> >> below -rc4 one as 'Critical bug fixes only.'? I think that makes intention 
> >> more
> >> clear.
> > 
> > I had added in -rc2: "Any issue found in -rc1 should be fixed."
> > Do you want to remove it as well?
> 
> I think we can keep it, good to highlight one of the major tasks for -rc2 is 
> to
> fix defects found in -rc1, and it doesn't limit fixes to ones found in -rc1.

Actually I think it is better to remove.
It looks weird to have it only in -rc2.




[dpdk-dev] [PATCH v9] doc: add release milestones definition

2021-09-14 Thread Thomas Monjalon
From: Asaf Penso 

Adding more information about the release milestones.
This includes the scope of change, expectations, etc.

Signed-off-by: Asaf Penso 
Signed-off-by: Thomas Monjalon 
Acked-by: John McNamara 
Acked-by: Ajit Khaparde 
Acked-by: Bruce Richardson 
Acked-by: Andrew Rybchenko 
---
v2: fix styling format and add content in the commit message
v3: change punctuation and avoid plural form when unneeded
v4: avoid abbreviations, "Priority" in -rc, and reword as John suggests
v5: note that release candidates may vary
v6: merge RFC and proposal deadline, add roadmap link and reduce duplication
v7: make expectations clearer and stricter
v8: add tests, more fixes, maintainers approval and new API rules
v9: make deadlines more explicit, remove confusing lines about fixes
---
 doc/guides/contributing/patches.rst | 83 +++--
 1 file changed, 78 insertions(+), 5 deletions(-)

diff --git a/doc/guides/contributing/patches.rst 
b/doc/guides/contributing/patches.rst
index b9cc6e67ae..5a83209474 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -164,6 +164,10 @@ Make your planned changes in the cloned ``dpdk`` repo. 
Here are some guidelines
   the :doc:`ABI policy ` and :ref:`ABI versioning `
   guides. New external functions should also be added in alphabetical order.
 
+* Any new API function should be used in ``/app`` test directory.
+
+* When introducing a new device API, at least one driver should implement it.
+
 * Important changes will require an addition to the release notes in 
``doc/guides/rel_notes/``.
   See the :ref:`Release Notes section of the Documentation Guidelines 
` for details.
 
@@ -177,6 +181,8 @@ Make your planned changes in the cloned ``dpdk`` repo. Here 
are some guidelines
 * Add documentation, if relevant, in the form of Doxygen comments or a User 
Guide in RST format.
   See the :ref:`Documentation Guidelines `.
 
+* Code and related documentation must be updated atomically in the same patch.
+
 Once the changes have been made you should commit them to your local repo.
 
 For small changes, that do not require specific explanations, it is better to 
keep things together in the
@@ -185,11 +191,6 @@ Larger changes that require different explanations should 
be separated into logi
 A good way of thinking about whether a patch should be split is to consider 
whether the change could be
 applied without dependencies as a backport.
 
-It is better to keep the related documentation changes in the same patch
-file as the code, rather than one big documentation patch at the end of a
-patchset. This makes it easier for future maintenance and development of the
-code.
-
 As a guide to how patches should be structured run ``git log`` on similar 
files.
 
 
@@ -663,3 +664,75 @@ patch accepted. The general cycle for patch review and 
acceptance is:
  than rework of the original.
* Trivial patches may be merged sooner than described above at the tree 
committer's
  discretion.
+
+
+Milestones definition
+-
+
+Each DPDK release has milestones that help everyone to converge to the release 
date.
+The following is a list of these milestones together with
+concrete definitions and expectations for a typical release cycle.
+An average cycle lasts 3 months and have 4 release candidates in the last 
month.
+Test reports are expected to be received after each release candidate.
+The number and expectations of release candidates might vary slightly.
+The schedule is updated in the `roadmap 
`_.
+
+.. note::
+   Sooner is always better. Deadlines are not ideal dates.
+
+   Integration is never guaranteed but everyone can help.
+
+Roadmap
+~~~
+
+* Announce new features in libraries, drivers, applications, and examples.
+* To be published before the previous release.
+
+Proposal Deadline
+~
+
+* Must send an RFC (Request For Comments) or a complete patch of new features.
+* Early RFC gives time for design review before complete implementation.
+* Should include at least the API changes in libraries and applications.
+* Library code should be quite complete at the deadline.
+* Nice to have: driver implementation, example code, and documentation.
+
+rc1
+~~~
+
+* Priority: libraries. No library feature should be accepted after -rc1.
+* API changes or additions must be implemented in libraries.
+* The API must include Doxygen documentation
+  and be part of the relevant .rst files (library-specific and release notes).
+* API should be used in a test application (``/app``).
+* At least one PMD should implement the API.
+  It may be a draft sent in a separate series.
+* The above should be sent to the mailing list at least 2 weeks before -rc1
+  to give time for review and maintainers approval.
+* If no review after 10 days, a reminder should be sent.
+* Nice to have: example code (``/examples``)
+
+rc2
+~~~
+
+* Priority: drivers. No driver feature should

Re: [dpdk-dev] [PATCH] app/testpmd: Document what the application does.

2021-09-14 Thread Ferruh Yigit
On 9/10/2021 6:57 PM, Ben Pfaff wrote:
> I could not find anything in the documentation that says what
> testpmd does.  This should save other people time trying to
> figure that out in the future.
> 
> Signed-off-by: Ben Pfaff 
> ---
>  doc/guides/testpmd_app_ug/run_app.rst | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/doc/guides/testpmd_app_ug/run_app.rst 
> b/doc/guides/testpmd_app_ug/run_app.rst
> index 6061674239..7c3406f72b 100644
> --- a/doc/guides/testpmd_app_ug/run_app.rst
> +++ b/doc/guides/testpmd_app_ug/run_app.rst
> @@ -15,6 +15,12 @@ a list of available EAL command-line options.
>  Testpmd Command-line Options
>  
>  
> +By default, testpmd receives packets on each configured port and
> +forwards the received packets to its paired port.  Ports 0 and 1 are
> +paired, as are ports 2 and 3, and so on.  With an odd number of ports,
> +the last port is paired with itself: packets received on the port are
> +sent back out on the same port.
Hi Ben,

testpmd has the concept of 'forwarding engine' (struct fwd_engine), which is
decoupled from testpmd logic and can be changed in the runtime. What you
described above is the 'io' (default) forwarding engine.
There are forwarding engines like 'flowgen' that generates and send L3 packets,
so behaves like a very simple packet generator, etc...

And even for the 'io' forwarding engine, the paired port description above is
controlled by "--port-topology=", so for example if the
'chained' topology is selected, packets will be forwarded as 0 -> 1 -> 2 -> 3 ..


Overall, I understand the intention to briefly describe the testpmd, but I think
it is not correct to reduce the description to packet forwarding, although that
is an important function of testpmd, it is not only function and testpmd does
much more, it has many control path functions.

Cheers,
ferruh


> +
>  The following are the command-line options for the testpmd applications.
>  They must be separated from the EAL options, shown in the previous section, 
> with a ``--`` separator:
>  
> 



Re: [dpdk-dev] [PATCH v2] app/testpmd: Document what the application does.

2021-09-14 Thread Ferruh Yigit
On 9/13/2021 7:33 PM, Ben Pfaff wrote:
> I could not find anything in the documentation that says what
> testpmd does.  This should save other people time trying to
> figure that out in the future.
> 
> Signed-off-by: Ben Pfaff 
> ---
> v1->v2: Revise introduction instead of option documentation.
>   Thanks to Thomas Monjalon for advice.
> 
>  doc/guides/testpmd_app_ug/intro.rst | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/testpmd_app_ug/intro.rst 
> b/doc/guides/testpmd_app_ug/intro.rst
> index 5d8d8cf4eb..1129f53c62 100644
> --- a/doc/guides/testpmd_app_ug/intro.rst
> +++ b/doc/guides/testpmd_app_ug/intro.rst
> @@ -6,9 +6,13 @@ Introduction
>  
>  This document is a user guide for the ``testpmd`` example application that 
> is shipped as part of the Data Plane Development Kit.
>  
> -The ``testpmd`` application can be used to test the DPDK in a packet 
> forwarding mode
> -and also to access NIC hardware features such as Flow Director.
> -It also serves as a example of how to build a more fully-featured 
> application using the DPDK SDK.
> +``testpmd`` is a tool to test ethdev NIC features, including NIC
> +hardware features such as Flow Director.  It receives packets on each
> +configured port and forwards them.  By default, packets received on
> +port 0 are forwarded to port 1, and vice versa, and similarly for
> +ports 2 and 3, ports 4 and 5, and so on.  If an odd number of ports is
> +configured, packets received on the last port are sent back out on the
> +same port.
>  

Ahh, I missed that there is a v2, I already commented on the first version.

Just to summarize here, I think documenting a specific forwarding engine and
specific configuration of it as what 'testpmd' does can be misleading.

What about making it a little more generic and provide link for more details,
like: https://doc.dpdk.org/guides/testpmd_app_ug/testpmd_funcs.html#set-fwd



>  The guide shows how to build and run the testpmd application and
>  how to configure the application from the command line and the run-time 
> environment.
> 



Re: [dpdk-dev] [PATCH 8/8] bus/pci: remove ABIs in PCI bus

2021-09-14 Thread Xu, Rosen



> -Original Message-
> From: Xia, Chenbo 
> Sent: Friday, September 10, 2021 10:24
> To: dev@dpdk.org
> Cc: Chautru, Nicolas ; Yigit, Ferruh
> ; Burakov, Anatoly ;
> Ray Kinsella ; Nithin Dabilpuram
> ; Kiran Kumar K ;
> Sunil Kumar Kori ; Satha Rao
> ; Matan Azrad ; Shahaf
> Shuler ; Viacheslav Ovsiienko
> ; Jerin Jacob ; Anoob Joseph
> ; Trahe, Fiona ; Griffin, John
> ; Jain, Deepak K ;
> Andrew Rybchenko ; Ashish Gupta
> ; Somalapuram Amaranath
> ; Ankur Dwivedi ; Tejasree
> Kondoj ; Nagadheeraj Rottela
> ; Srikanth Jampala ;
> Jay Zhou ; McDaniel, Timothy
> ; Pavan Nikhilesh
> ; Ashwin Sekhar T K ;
> Harman Kalra ; Shepard Siegel
> ; Ed Czeck ;
> John Miller ; Steven Webster
> ; Peters, Matt
> ; Rasesh Mody ;
> Shahed Shaikh ; Ajit Khaparde
> ; Somnath Kotur
> ; Chas Williams ; Min Hu
> (Connor) ; Rahul Lakkireddy
> ; Wang, Haiyue ;
> Marcin Wojtas ; Michal Krawczyk ;
> Shai Brandes ; Evgeny Schemeilin
> ; Igor Chauskin ; Daley, John
> ; Hyong Youb Kim ; Ziyang
> Xuan ; Xiaoyun Wang
> ; Guoyang Zhou
> ; Yisen Zhuang ;
> Lijun Ou ; Xing, Beilei ;
> Andrew Boyer ; Xu, Rosen ;
> Stephen Hemminger ; Long Li
> ; Devendra Singh Rawat
> ; Maciej Czekaj ;
> Jiawen Wu ; Jian Wang
> ; Maxime Coquelin
> ; Yong Wang ;
> Jakub Palider ; Tomasz Duszynski
> ; Zhang, Tianfei ;
> Richardson, Bruce ; Li, Xiaoyun
> ; Wu, Jingjing ; Radha Mohan
> Chintakuntla ; Veerasenareddy Burru
> ; Ori Kam ; Wang, Xiao W
> ; Thomas Monjalon 
> Subject: [PATCH 8/8] bus/pci: remove ABIs in PCI bus
> 
> As announced in the deprecation note, most of ABIs in PCI bus are
> removed in this patch. Only the function rte_pci_dump is still ABI
> and experimental APIs are kept for future promotion.
> 
> This patch creates a new file named pci_driver.h and moves most of
> the content in original rte_bus_pci.h to it. After that, pci_driver.h
> is considered the interface for drivers and rte_bus_pci.h for
> applications. pci_driver.h is defined as driver_sdk_headers so that
> out-of-tree drivers can use it.
> 
> Then this patch replaces the including of rte_bus_pci.h with pci_driver.h
> in all related drivers.
> 
> Signed-off-by: Chenbo Xia 
> ---
>  app/test/virtual_pmd.c|   2 +-
>  doc/guides/rel_notes/release_21_11.rst|   2 +
>  drivers/baseband/acc100/rte_acc100_pmd.c  |   2 +-
>  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c |   2 +-
>  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   2 +-
>  drivers/bus/pci/bsd/pci.c |   1 -
>  drivers/bus/pci/linux/pci.c   |   1 -
>  drivers/bus/pci/linux/pci_uio.c   |   1 -
>  drivers/bus/pci/linux/pci_vfio.c  |   1 -
>  drivers/bus/pci/meson.build   |   4 +
>  drivers/bus/pci/pci_common_uio.c  |   1 -
>  drivers/bus/pci/pci_driver.h  | 402 ++
>  drivers/bus/pci/pci_params.c  |   1 -
>  drivers/bus/pci/private.h |   3 +-
>  drivers/bus/pci/rte_bus_pci.h | 375 +---
>  drivers/bus/pci/version.map   |  32 +-
>  drivers/common/cnxk/roc_platform.h|   2 +-
>  drivers/common/mlx5/linux/mlx5_common_verbs.c |   2 +-
>  drivers/common/mlx5/mlx5_common_pci.c |   2 +-
>  drivers/common/octeontx2/otx2_dev.h   |   2 +-
>  drivers/common/octeontx2/otx2_sec_idev.c  |   2 +-
>  drivers/common/qat/qat_device.h   |   2 +-
>  drivers/common/qat/qat_qp.c   |   2 +-
>  drivers/common/sfc_efx/sfc_efx.h  |   2 +-
>  drivers/compress/mlx5/mlx5_compress.c |   2 +-
>  drivers/compress/octeontx/otx_zip.h   |   2 +-
>  drivers/compress/qat/qat_comp.c   |   2 +-
>  drivers/crypto/ccp/ccp_dev.h  |   2 +-
>  drivers/crypto/ccp/ccp_pci.h  |   2 +-
>  drivers/crypto/ccp/rte_ccp_pmd.c  |   2 +-
>  drivers/crypto/cnxk/cn10k_cryptodev.c |   2 +-
>  drivers/crypto/cnxk/cn9k_cryptodev.c  |   2 +-
>  drivers/crypto/mlx5/mlx5_crypto.c |   2 +-
>  drivers/crypto/nitrox/nitrox_device.h |   2 +-
>  drivers/crypto/octeontx/otx_cryptodev.c   |   2 +-
>  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   2 +-
>  drivers/crypto/octeontx2/otx2_cryptodev.c |   2 +-
>  drivers/crypto/qat/qat_sym.c  |   2 +-
>  drivers/crypto/qat/qat_sym_pmd.c  |   2 +-
>  drivers/crypto/virtio/virtio_cryptodev.c  |   2 +-
>  drivers/crypto/virtio/virtio_pci.h|   2 +-
>  drivers/event/dlb2/pf/dlb2_main.h |   2 +-
>  drivers/event/dlb2/pf/dlb2_pf.c   |   2 +-
>  drivers/event/octeontx/ssovf_probe.c  |   2 +-
>  drivers/event/octeontx/timvf_probe.c  |   2 +-
>  drivers/event/octeontx2/otx2_evdev.c  |   2 +-
>  drivers/mempool/cnxk/cnxk_mempool.c   |   2 +-
>  drivers/mempool/octeontx/octeontx_fpavf.c |   2 +-
>  drivers/mempool/octeontx2/otx2_mempoo

Re: [dpdk-dev] [PATCH v2 15/18] vhost: fix typo in comment

2021-09-14 Thread Maxime Coquelin
Hi Stehpen,

On 9/13/21 6:10 PM, Stephen Hemminger wrote:
> Yet another spelling error found by codespell.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  lib/vhost/rte_vhost.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> index 8d875e932297..c36dfc705b04 100644
> --- a/lib/vhost/rte_vhost.h
> +++ b/lib/vhost/rte_vhost.h
> @@ -776,7 +776,7 @@ rte_vhost_get_vhost_ring_inflight(int vid, uint16_t 
> vring_idx,
>  /**
>   * Set split inflight descriptor.
>   *
> - * This function save descriptors that has been comsumed in available
> + * This function save descriptors that has been consumed in available

You missed my comment on v1:

s/that has been/that have been/


>   * ring
>   *
>   * @param vid
> @@ -796,7 +796,7 @@ rte_vhost_set_inflight_desc_split(int vid, uint16_t 
> vring_idx,
>  /**
>   * Set packed inflight descriptor and get corresponding inflight entry
>   *
> - * This function save descriptors that has been comsumed
> + * This function save descriptors that has been consumed

Same here.

>   *
>   * @param vid
>   *  vhost device ID
> 



Re: [dpdk-dev] [PATCH v2 1/2] net/virtio: reconfigure LSC handler when required

2021-09-14 Thread Maxime Coquelin



On 8/31/21 5:54 PM, David Marchand wrote:
> There is no reason to re-register a interrupt handler for LSC if this
> feature was not requested in the first place.
> A simple usecase is when asking for Rx interrupts without LSC interrupt.
> 
> Fixes: 26b683b4f7d0 ("net/virtio: setup Rx queue interrupts")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: David Marchand 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index e58085a2c9..314a291e8c 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1684,13 +1684,15 @@ virtio_configure_intr(struct rte_eth_dev *dev)
>   }
>   }
>  
> - /* Re-register callback to update max_intr */
> - rte_intr_callback_unregister(dev->intr_handle,
> -  virtio_interrupt_handler,
> -  dev);
> - rte_intr_callback_register(dev->intr_handle,
> -virtio_interrupt_handler,
> -dev);
> + if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC) {
> + /* Re-register callback to update max_intr */
> + rte_intr_callback_unregister(dev->intr_handle,
> +  virtio_interrupt_handler,
> +  dev);
> + rte_intr_callback_register(dev->intr_handle,
> +virtio_interrupt_handler,
> +dev);
> + }
>  
>   /* DO NOT try to remove this! This function will enable msix, or QEMU
>* will encounter SIGSEGV when DRIVER_OK is sent.
> 

Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



Re: [dpdk-dev] [PATCH v2 2/2] net/virtio: fix virtio-user Rx interrupts with multi queue

2021-09-14 Thread Maxime Coquelin



On 8/31/21 5:54 PM, David Marchand wrote:
> The callfds[] array stores eventfds sequentially for Rx and Tx vq.
> 
> Fixes: 3d4fb6fd2505 ("net/virtio-user: support Rx interrupt")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: David Marchand 
> ---
>  drivers/net/virtio/virtio_user/virtio_user_dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
> b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> index 16c58710d7..89f8b2271f 100644
> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> @@ -416,7 +416,7 @@ virtio_user_fill_intr_handle(struct virtio_user_dev *dev)
>   }
>  
>   for (i = 0; i < dev->max_queue_pairs; ++i)
> - eth_dev->intr_handle->efds[i] = dev->callfds[i];
> + eth_dev->intr_handle->efds[i] = dev->callfds[2 * i];
>   eth_dev->intr_handle->nb_efd = dev->max_queue_pairs;
>   eth_dev->intr_handle->max_intr = dev->max_queue_pairs + 1;
>   eth_dev->intr_handle->type = RTE_INTR_HANDLE_VDEV;
> 

Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



[dpdk-dev] [PATCH v1 0/2] fix RSS configuration failure

2021-09-14 Thread Wenjun Wu
Due to share code limitation, independent configuration of default
RSS for IP fragment packets need to be removed.

Wenjun Wu (2):
  net/ice: fix RSS configuration failure
  net/iavf: fix RSS configuration failure

 drivers/net/iavf/iavf_hash.c | 10 --
 drivers/net/ice/ice_ethdev.c | 22 +-
 2 files changed, 1 insertion(+), 31 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v1 1/2] net/ice: fix RSS configuration failure

2021-09-14 Thread Wenjun Wu
Due to share code limitation, when RSS for IP packets and IP fragment
packets coexists, they cannot share the same hash field.
As a result, independent configuration of default RSS for IP fragment
packets need to be removed.

This patch revert the original patch to fix this failure.

Fixes: 91f59358dc05 ("net/ice: fix default RSS hash for IP fragment packets")
Fixes: 4027fffe86f4 ("net/ice: support default RSS for IP fragment packet")
Cc: sta...@dpdk.org

Signed-off-by: Wenjun Wu 
---
 drivers/net/ice/ice_ethdev.c | 22 +-
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 8d62b84805..0fcaf24fb1 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -2822,9 +2822,7 @@ ice_rss_hash_set(struct ice_pf *pf, uint64_t rss_hf)
ETH_RSS_NONFRAG_IPV4_TCP | \
ETH_RSS_NONFRAG_IPV6_TCP | \
ETH_RSS_NONFRAG_IPV4_SCTP | \
-   ETH_RSS_NONFRAG_IPV6_SCTP | \
-   ETH_RSS_FRAG_IPV4 | \
-   ETH_RSS_FRAG_IPV6)
+   ETH_RSS_NONFRAG_IPV6_SCTP)
 
ret = ice_rem_vsi_rss_cfg(hw, vsi->idx);
if (ret)
@@ -2979,24 +2977,6 @@ ice_rss_hash_set(struct ice_pf *pf, uint64_t rss_hf)
__func__, ret);
}
 
-   if (rss_hf & ETH_RSS_FRAG_IPV4) {
-   cfg.addl_hdrs = ICE_FLOW_SEG_HDR_IPV4 | 
ICE_FLOW_SEG_HDR_IPV_FRAG;
-   cfg.hash_flds = ICE_FLOW_HASH_IPV4;
-   ret = ice_add_rss_cfg_wrap(pf, vsi->idx, &cfg);
-   if (ret)
-   PMD_DRV_LOG(ERR, "%s IPV4_FRAG rss flow fail %d",
-   __func__, ret);
-   }
-
-   if (rss_hf & ETH_RSS_FRAG_IPV6) {
-   cfg.addl_hdrs = ICE_FLOW_SEG_HDR_IPV6 | 
ICE_FLOW_SEG_HDR_IPV_FRAG;
-   cfg.hash_flds = ICE_FLOW_HASH_IPV6;
-   ret = ice_add_rss_cfg_wrap(pf, vsi->idx, &cfg);
-   if (ret)
-   PMD_DRV_LOG(ERR, "%s IPV6_FRAG rss flow fail %d",
-   __func__, ret);
-   }
-
pf->rss_hf = rss_hf & ICE_RSS_HF_ALL;
 }
 
-- 
2.25.1



[dpdk-dev] [PATCH v1 2/2] net/iavf: fix RSS configuration failure

2021-09-14 Thread Wenjun Wu
Due to share code limitation, when RSS for IP packets and IP fragment
packets coexists, they cannot share the same hash field.
As a result, independent configuration of default RSS for IP fragment
packets need to be removed.

This patch revert the original patch to fix this failure.

Fixes: c40525568480 ("net/iavf: fix default RSS hash for IP fragment packets")
Fixes: 9e29a278bc0c ("net/iavf: support default RSS for IP fragment")
Cc: sta...@dpdk.org

Signed-off-by: Wenjun Wu 
---
 drivers/net/iavf/iavf_hash.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/net/iavf/iavf_hash.c b/drivers/net/iavf/iavf_hash.c
index eba55ecea5..03dae5d999 100644
--- a/drivers/net/iavf/iavf_hash.c
+++ b/drivers/net/iavf/iavf_hash.c
@@ -623,16 +623,6 @@ iavf_rss_hash_set(struct iavf_adapter *ad, uint64_t 
rss_hf, bool add)
iavf_add_del_rss_cfg(ad, &rss_cfg, add);
}
 
-   if (rss_hf & ETH_RSS_FRAG_IPV4) {
-   rss_cfg.proto_hdrs = outer_ipv4_tmplt;
-   iavf_add_del_rss_cfg(ad, &rss_cfg, add);
-   }
-
-   if (rss_hf & ETH_RSS_FRAG_IPV6) {
-   rss_cfg.proto_hdrs = outer_ipv6_tmplt;
-   iavf_add_del_rss_cfg(ad, &rss_cfg, add);
-   }
-
vf->rss_hf = rss_hf & IAVF_RSS_HF_ALL;
return 0;
 }
-- 
2.25.1



Re: [dpdk-dev] [PATCH v2] config/ppc: ignore gcc 11 psabi warnings

2021-09-14 Thread Ferruh Yigit
On 9/3/2021 12:53 AM, David Christensen wrote:
> Suppress the gcc warning "note: the layout of aggregates containing
> vectors with 4-byte alignment has changed in GCC 5" on POWER systems
> by setting "-Wno-psabi".  Warning was originally added to gcc in
> commit https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=9832651 to warn
> of the vector alignment changes introduced in GCC 5.  Older gcc
> versions forced vector alignment to 16 bytes due to requirements for
> POWER 6 and earlier CPUs, but these restrictions don't apply to CPUs
> supported by DPDK.
> 
> Bugzilla ID: 739
> 
> Signed-off-by: David Christensen 
> ---
> v2:
> - update copyright year
> - rebase for 21.11-rc0
> ---
>  config/ppc/meson.build | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/config/ppc/meson.build b/config/ppc/meson.build
> index adf49e1f42..5354db4e0a 100644
> --- a/config/ppc/meson.build
> +++ b/config/ppc/meson.build
> @@ -1,5 +1,6 @@
>  # SPDX-License-Identifier: BSD-3-Clause
>  # Copyright(c) 2018 Luca Boccassi 
> +# Copyright(c) 2021 IBM Corporation
> 
>  if not dpdk_conf.get('RTE_ARCH_64')
>  error('Only 64-bit compiles are supported for this platform type')
> @@ -17,6 +18,12 @@ if not power9_supported
>  dpdk_conf.set('RTE_MACHINE','power8')
>  endif
> 
> +# Suppress the gcc warning "note: the layout of aggregates containing
> +# vectors with 4-byte alignment has changed in GCC 5".
> +if cc.get_id() == 'gcc' and cc.version().version_compare('>=10.0') and 
> cc.version().version_compare('<12.0') and cc.has_argument('-Wno-psabi')
> +add_project_arguments('-Wno-psabi', language: 'c')
> +endif
> +
>  # Certain POWER9 systems can scale as high as 1536 LCORES, but setting such a
>  # high value can waste memory, cause timeouts in time limited autotests, and 
> is
>  # unlikely to be used in many production situations.  Similarly, keeping the
> --

I am getting following build error in my environment:
"config/ppc/meson.build:23:6: ERROR: Unknown statement."

The compiler I have is:
powerpc64le-linux-gcc (gcc 10.2.0 "powerpc64le-linux-gcc.br_real (Buildroot
2020.08-14-ge5a2a90) 10.2.0")

meson version: Version: 0.59.1

Multi-line statements seems need to be merged with '\':

diff --git a/config/ppc/meson.build b/config/ppc/meson.build
index 0b1948fc7cb9..f95009c77e7a 100644
--- a/config/ppc/meson.build
+++ b/config/ppc/meson.build
@@ -20,7 +20,7 @@ endif

 # Suppress the gcc warning "note: the layout of aggregates containing
 # vectors with 4-byte alignment has changed in GCC 5".
-if cc.get_id() == 'gcc' and cc.version().version_compare('>=10.0') and
+if cc.get_id() == 'gcc' and cc.version().version_compare('>=10.0') and \
 cc.version().version_compare('<12.0') and cc.has_argument('-Wno-psabi')
 add_project_arguments('-Wno-psabi', language: 'c')
 endif



Re: [dpdk-dev] [PATCH v1 1/6] build: increase default of max lcores to 512

2021-09-14 Thread David Hunt



On 10/9/2021 9:24 AM, Thomas Monjalon wrote:

10/09/2021 10:06, David Marchand:

On Fri, Sep 10, 2021 at 9:54 AM Bruce Richardson
 wrote:

On Fri, Sep 10, 2021 at 08:51:04AM +0200, David Marchand wrote:

On Thu, Sep 9, 2021 at 4:38 PM Bruce Richardson
 wrote:

On Thu, Sep 09, 2021 at 02:45:06PM +0100, David Hunt wrote:

Modern processors are coming with an ever increasing number of cores,
and 128 does not seem like a sensible max limit any more, especially
when you consider multi-socket systems with Hyper-Threading enabled.

This patch increases max_lcores default from 128 to 512.

Signed-off-by: David Hunt 

Why should we need this?

--lcores makes it possible to pin 128 lcores to any physical core on
your system.
And for applications that have their own thread management, they can
pin thread, then use rte_thread_register.

Do you have applications that require more than 128 lcores?


The trouble is that using the --lcores syntax for mapping high core numbers
to low lcore ids is much more awkward to use. Every case of DPDK use I've
seen uses -c with a coremask, or -l with just giving a few core numbers on
it. This simple scheme won't work with core numbers greater than 128, and
there are already systems available with more than that number of cores.

Apart from the memory footprint issues - which this patch is already making
a good start in addressing, why would we not increase the default
max_lcores to that seen on real systems?

The memory footprint is a major issue to me, and reserving all those
lcores won't be needed in any system.
We will also have to decide on a "640k ought to be enough" value to
avoid ABI issue with the next processor that comes out and has more
than 512 cores.

Could we wire the -c / -l options to --lcores behavior ?
It breaks the 1:1 lcore/physical core assumption, but it solves your
usability issue.

Why would we change existing options while we already have an option
(--lcores) which solves the issue above?
I think the only issue is to educate users.
Is there something to improve in the documentation?



Hi all,
I agree that it’s a good idea to switch to using the “--lcrores” option 
for cores above the default, that’s already future proofed.
However, I’m still a little concerned about usability, if our users are 
accustomed to the “-c” and “-l” options, I suggest that we add a warning 
to suggest using the “--lcores” option if any of the cores provided on 
the command line are above RTE_MAX_LCORE. That would help them with the 
solution to using physical cores above 128 (or whatever the compiled 
default is).


Example:

“ERROR: logical core 212 is above the maximum lcore number permitted.
Please use the --lcores option to map lcores onto physical cores, e.g. 
--lcores="(0-3)@(212-215).”


I’ll replace the first patch in the set with a patch that adds the 
additional information in the error message.


Thanks,
Dave.




[dpdk-dev] [RFC PATCH 00/10] Support MLX5 crypto driver on Windows

2021-09-14 Thread Tal Shnaiderman
Support the MLX5 crypto driver on Windows OS by moving the driver's
control path communication with the Kernel to be OS agnostic.
---
Depends-on: patch 98796 ("cryptodev: build on Windows")
---
Tal Shnaiderman (10):
  common/mlx5: add DV enums to Windows defs file
  common/mlx5: add an agnostic OS function to open device context
  common/mlx5: move pdn getter to common driver
  common/mlx5: add memory region OS agnostic functions for Linux
  crypto/mlx5: replace UNIX functions with EAL functions
  crypto/mlx5: use OS agnostic functions for UMEM operations
  crypto/mlx5: use OS agnostic functions for PD operations
  crypto/mlx5: use OS agnostic functions for Verbs operations
  crypto/mlx5: fix size of UMR WQE
  crypto/mlx5: support on Windows

 drivers/common/mlx5/linux/mlx5_common_os.c   |  98 +++
 drivers/common/mlx5/mlx5_common.h|  17 
 drivers/common/mlx5/version.map  |   5 +-
 drivers/common/mlx5/windows/mlx5_common_os.c | 141 ++-
 drivers/common/mlx5/windows/mlx5_common_os.h |   8 +-
 drivers/common/mlx5/windows/mlx5_win_defs.h  |  12 +++
 drivers/crypto/aesni_gcm/meson.build |   6 ++
 drivers/crypto/aesni_mb/meson.build  |   6 ++
 drivers/crypto/armv8/meson.build |   6 ++
 drivers/crypto/bcmfs/meson.build |   6 ++
 drivers/crypto/ccp/meson.build   |   1 +
 drivers/crypto/kasumi/meson.build|   6 ++
 drivers/crypto/meson.build   |   3 -
 drivers/crypto/mlx5/meson.build  |   4 +-
 drivers/crypto/mlx5/mlx5_crypto.c|  80 ---
 drivers/crypto/mlx5/mlx5_crypto.h|   6 +-
 drivers/crypto/mvsam/meson.build |   6 ++
 drivers/crypto/null/meson.build  |   6 ++
 drivers/crypto/octeontx/meson.build  |   6 ++
 drivers/crypto/openssl/meson.build   |   6 ++
 drivers/crypto/qat/meson.build   |   6 ++
 drivers/crypto/scheduler/meson.build |   6 ++
 drivers/crypto/snow3g/meson.build|   6 ++
 drivers/crypto/virtio/meson.build|   6 ++
 drivers/crypto/zuc/meson.build   |   6 ++
 drivers/net/mlx5/linux/mlx5_os.c |  35 ---
 drivers/net/mlx5/mlx5.h  |   1 -
 drivers/net/mlx5/windows/mlx5_os.c   |  85 +---
 28 files changed, 402 insertions(+), 178 deletions(-)

-- 
2.16.1.windows.4



[dpdk-dev] [RFC PATCH 02/10] common/mlx5: add an agnostic OS function to open device context

2021-09-14 Thread Tal Shnaiderman
Add a function to open device context from a rte_device.

Function mlx5_os_open_device_context can be used both on
Windows and Linux OS.

Signed-off-by: Tal Shnaiderman 
---
 drivers/common/mlx5/linux/mlx5_common_os.c   |  28 +++
 drivers/common/mlx5/mlx5_common.h|   4 +
 drivers/common/mlx5/version.map  |   2 +
 drivers/common/mlx5/windows/mlx5_common_os.c | 118 +++
 drivers/common/mlx5/windows/mlx5_common_os.h |   2 +
 drivers/net/mlx5/windows/mlx5_os.c   |  64 +--
 6 files changed, 155 insertions(+), 63 deletions(-)

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c 
b/drivers/common/mlx5/linux/mlx5_common_os.c
index 9e0c823c97..3ef507944f 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -428,3 +428,31 @@ mlx5_os_get_ibv_device(const struct rte_pci_addr *addr)
mlx5_glue->free_device_list(ibv_list);
return ibv_match;
 }
+
+/**
+ * Open device context from a rte_device.
+ *
+ * @param[in] dev
+ *  Pointer to an rte_device struct.
+ * @return
+ *   Pointer to device context or NULL in case context cannot be found.
+ */
+void *
+mlx5_os_open_device_context(struct rte_device *dev)
+{
+   struct ibv_device *ibv;
+   void *ctx;
+
+   ibv = mlx5_os_get_ibv_dev(dev);
+   if (ibv == NULL) {
+   DRV_LOG(ERR, "Failed getting ibv_dev");
+   return NULL;
+   }
+   ctx = mlx5_glue->dv_open_device(ibv);
+   if (ctx == NULL) {
+   DRV_LOG(ERR, "Failed to open IB device \"%s\".", ibv->name);
+   rte_errno = ENODEV;
+   return NULL;
+   }
+   return ctx;
+}
diff --git a/drivers/common/mlx5/mlx5_common.h 
b/drivers/common/mlx5/mlx5_common.h
index a772371200..249804b00c 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -419,4 +419,8 @@ __rte_internal
 bool
 mlx5_dev_is_pci(const struct rte_device *dev);
 
+__rte_internal
+void *
+mlx5_os_open_device_context(struct rte_device *dev);
+
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index e5cb6b7060..6d4258dd25 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -141,6 +141,8 @@ INTERNAL {
mlx5_os_alloc_pd;
mlx5_os_dealloc_pd;
mlx5_os_dereg_mr;
+   mlx5_os_match_devx_devices_to_addr;
+   mlx5_os_open_device_context;
mlx5_os_get_ibv_dev; # WINDOWS_NO_EXPORT
mlx5_os_reg_mr;
mlx5_os_umem_dereg;
diff --git a/drivers/common/mlx5/windows/mlx5_common_os.c 
b/drivers/common/mlx5/windows/mlx5_common_os.c
index 5031bdca26..3b59e57e57 100644
--- a/drivers/common/mlx5/windows/mlx5_common_os.c
+++ b/drivers/common/mlx5/windows/mlx5_common_os.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -205,3 +206,120 @@ mlx5_os_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
claim_zero(mlx5_os_umem_dereg(pmd_mr->obj));
memset(pmd_mr, 0, sizeof(*pmd_mr));
 }
+
+/**
+ * Detect if a devx_device_bdf object has identical DBDF values to the
+ * rte_pci_addr found in bus/pci probing
+ *
+ * @param[in] devx_bdf
+ *   Pointer to the devx_device_bdf structure.
+ * @param[in] addr
+ *   Pointer to the rte_pci_addr structure.
+ *
+ * @return
+ *   1 on Device match, 0 on mismatch.
+ */
+static int
+mlx5_os_match_devx_bdf_to_addr(struct devx_device_bdf *devx_bdf,
+   struct rte_pci_addr *addr)
+{
+   if (addr->domain != (devx_bdf->bus_id >> 8) ||
+   addr->bus != (devx_bdf->bus_id & 0xff) ||
+   addr->devid != devx_bdf->dev_id ||
+   addr->function != devx_bdf->fnc_id) {
+   return 0;
+   }
+   return 1;
+}
+
+/**
+ * Detect if a devx_device_bdf object matches the rte_pci_addr
+ * found in bus/pci probing
+ * Compare both the Native/PF BDF and the raw_bdf representing a VF BDF.
+ *
+ * @param[in] devx_bdf
+ *   Pointer to the devx_device_bdf structure.
+ * @param[in] addr
+ *   Pointer to the rte_pci_addr structure.
+ *
+ * @return
+ *   1 on Device match, 0 on mismatch, rte_errno code on failure.
+ */
+int
+mlx5_os_match_devx_devices_to_addr(struct devx_device_bdf *devx_bdf,
+   struct rte_pci_addr *addr)
+{
+   int err;
+   struct devx_device mlx5_dev;
+
+   if (mlx5_os_match_devx_bdf_to_addr(devx_bdf, addr))
+   return 1;
+   /**
+* Didn't match on Native/PF BDF, could still
+* Match a VF BDF, check it next
+*/
+   err = mlx5_glue->query_device(devx_bdf, &mlx5_dev);
+   if (err) {
+   DRV_LOG(ERR, "query_device failed");
+   rte_errno = err;
+   return rte_errno;
+   }
+   if (mlx5_os_match_devx_bdf_to_addr(&mlx5_dev.raw_bdf, addr))
+   return 1;
+   return 0;
+}
+
+/**
+ * Open device context from a rte

[dpdk-dev] [RFC PATCH 01/10] common/mlx5: add DV enums to Windows defs file

2021-09-14 Thread Tal Shnaiderman
Add needed DV enums used by the crypto PMD and missing
for Windows OS.

Signed-off-by: Tal Shnaiderman 
---
 drivers/common/mlx5/windows/mlx5_win_defs.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/common/mlx5/windows/mlx5_win_defs.h 
b/drivers/common/mlx5/windows/mlx5_win_defs.h
index 47bfc907e7..9f709ff30d 100644
--- a/drivers/common/mlx5/windows/mlx5_win_defs.h
+++ b/drivers/common/mlx5/windows/mlx5_win_defs.h
@@ -93,6 +93,18 @@ enum {
MLX5_ETH_WQE_L4_CSUM = (1 << 7),
 };
 
+enum {
+   MLX5_WQE_CTRL_CQ_UPDATE = 2 << 2,
+   MLX5_WQE_CTRL_SOLICITED = 1 << 1,
+   MLX5_WQE_CTRL_FENCE = 4 << 5,
+   MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE = 1 << 5,
+};
+
+enum {
+   MLX5_SEND_WQE_BB= 64,
+   MLX5_SEND_WQE_SHIFT = 6,
+};
+
 /*
  * RX Hash fields enable to set which incoming packet's field should
  * participates in RX Hash. Each flag represent certain packet's field,
-- 
2.16.1.windows.4



[dpdk-dev] [RFC PATCH 08/10] crypto/mlx5: use OS agnostic functions for Verbs operations

2021-09-14 Thread Tal Shnaiderman
use the functions mlx5_os_open_device_context, mlx5_os_get_ctx_device_name
mlx5_os_reg_mr mlx5_os_dereg_mr instead of the ib verbs functions
and variables to support device operations on all OSs.

Signed-off-by: Tal Shnaiderman 
---
 drivers/crypto/mlx5/mlx5_crypto.c | 41 +--
 drivers/crypto/mlx5/mlx5_crypto.h |  2 +-
 2 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/drivers/crypto/mlx5/mlx5_crypto.c 
b/drivers/crypto/mlx5/mlx5_crypto.c
index 35319d0115..3f5a6745dc 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -796,9 +796,6 @@ mlx5_crypto_hw_global_release(struct mlx5_crypto_priv *priv)
 static int
 mlx5_crypto_pd_create(struct mlx5_crypto_priv *priv)
 {
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-   struct mlx5dv_obj obj;
-   struct mlx5dv_pd pd_info;
int ret;
 
priv->pd = mlx5_os_alloc_pd(priv->ctx);
@@ -814,11 +811,6 @@ mlx5_crypto_pd_create(struct mlx5_crypto_priv *priv)
return -errno;
}
return 0;
-#else
-   (void)priv;
-   DRV_LOG(ERR, "Cannot get pdn - no DV support.");
-   return -ENOTSUP;
-#endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 }
 
 static int
@@ -964,8 +956,8 @@ mlx5_crypto_mr_mem_event_cb(enum rte_mem_event event_type, 
const void *addr,
/* Iterate all the existing mlx5 devices. */
TAILQ_FOREACH(priv, &mlx5_crypto_priv_list, next)
mlx5_free_mr_by_addr(&priv->mr_scache,
-priv->ctx->device->name,
-addr, len);
+mlx5_os_get_ctx_device_name(
+priv->ctx), addr, len);
pthread_mutex_unlock(&priv_list_lock);
break;
case RTE_MEM_EVENT_ALLOC:
@@ -977,9 +969,9 @@ mlx5_crypto_mr_mem_event_cb(enum rte_mem_event event_type, 
const void *addr,
 static int
 mlx5_crypto_dev_probe(struct rte_device *dev)
 {
-   struct ibv_device *ibv;
struct rte_cryptodev *crypto_dev;
-   struct ibv_context *ctx;
+   void *ctx;
+   const char *device_name;
struct mlx5_devx_obj *login;
struct mlx5_crypto_priv *priv;
struct mlx5_crypto_devarg_params devarg_prms = { 0 };
@@ -999,15 +991,19 @@ mlx5_crypto_dev_probe(struct rte_device *dev)
rte_errno = ENOTSUP;
return -rte_errno;
}
-   ibv = mlx5_os_get_ibv_dev(dev);
-   if (ibv == NULL)
-   return -rte_errno;
-   ctx = mlx5_glue->dv_open_device(ibv);
+   ctx = mlx5_os_open_device_context(dev);
if (ctx == NULL) {
-   DRV_LOG(ERR, "Failed to open IB device \"%s\".", ibv->name);
+   DRV_LOG(ERR, "Failed to open IB device.");
rte_errno = ENODEV;
return -rte_errno;
}
+   device_name = mlx5_os_get_ctx_device_name(ctx);
+   if (!device_name) {
+   DRV_LOG(ERR, "Failed getting device name");
+   claim_zero(mlx5_glue->close_device(ctx));
+   rte_errno = ENODEV;
+   return -ENODEV;
+   }
if (mlx5_devx_cmd_query_hca_attr(ctx, &attr) != 0 ||
attr.crypto == 0 || attr.aes_xts == 0) {
DRV_LOG(ERR, "Not enough capabilities to support crypto "
@@ -1029,15 +1025,14 @@ mlx5_crypto_dev_probe(struct rte_device *dev)
claim_zero(mlx5_glue->close_device(ctx));
return -rte_errno;
}
-   crypto_dev = rte_cryptodev_pmd_create(ibv->name, dev,
-   &init_params);
+   crypto_dev = rte_cryptodev_pmd_create(device_name, dev, &init_params);
if (crypto_dev == NULL) {
-   DRV_LOG(ERR, "Failed to create device \"%s\".", ibv->name);
+   DRV_LOG(ERR, "Failed to create device \"%s\".", device_name);
claim_zero(mlx5_glue->close_device(ctx));
return -ENODEV;
}
DRV_LOG(INFO,
-   "Crypto device %s was created successfully.", ibv->name);
+   "Crypto device %s was created successfully.", device_name);
crypto_dev->dev_ops = &mlx5_crypto_ops;
crypto_dev->dequeue_burst = mlx5_crypto_dequeue_burst;
crypto_dev->enqueue_burst = mlx5_crypto_enqueue_burst;
@@ -1061,8 +1056,8 @@ mlx5_crypto_dev_probe(struct rte_device *dev)
rte_errno = ENOMEM;
return -rte_errno;
}
-   priv->mr_scache.reg_mr_cb = mlx5_common_verbs_reg_mr;
-   priv->mr_scache.dereg_mr_cb = mlx5_common_verbs_dereg_mr;
+   priv->mr_scache.reg_mr_cb = mlx5_os_reg_mr;
+   priv->mr_scache.dereg_mr_cb = mlx5_os_dereg_mr;
priv->keytag = rte_cpu_to_be_64(devarg_prms.keytag);
priv->max_segs_num = devarg_prms.max_segs_num;
priv->umr_wqe_size = sizeof(struct mlx5_wqe_umr_bsf_seg) +
diff --git a/drivers/cr

[dpdk-dev] [RFC PATCH 04/10] common/mlx5: add memory region OS agnostic functions for Linux

2021-09-14 Thread Tal Shnaiderman
The OS agnostic functions for memory region registration/deregistration
(mlx5_os_reg_mr mlx5_os_dereg_mr) exist only for Windows OS.

Adding them for Linux as well as they are needed for memory region
activities in shared code.

Signed-off-by: Tal Shnaiderman 
---
 drivers/common/mlx5/linux/mlx5_common_os.c   | 35 
 drivers/common/mlx5/mlx5_common.h|  9 +++
 drivers/common/mlx5/windows/mlx5_common_os.c |  2 +-
 drivers/common/mlx5/windows/mlx5_common_os.h |  6 -
 4 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c 
b/drivers/common/mlx5/linux/mlx5_common_os.c
index 4aada82669..fd0ec6b748 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -491,3 +491,38 @@ mlx5_os_get_pdn(void *pd, uint32_t *pdn)
return -ENOTSUP;
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 }
+
+/**
+ * Register mr. Given protection domain pointer, pointer to addr and length
+ * register the memory region.
+ *
+ * @param[in] pd
+ *   Pointer to protection domain context (type mlx5_pd).
+ * @param[in] addr
+ *   Pointer to memory start address (type devx_device_ctx).
+ * @param[in] length
+ *   Lengtoh of the memory to register.
+ * @param[out] pmd_mr
+ *   pmd_mr struct set with lkey, address, length, pointer to mr object, mkey
+ *
+ * @return
+ *   0 on successful registration, -1 otherwise
+ */
+int
+mlx5_os_reg_mr(void *pd,
+  void *addr, size_t length, struct mlx5_pmd_mr *pmd_mr)
+{
+   return mlx5_common_verbs_reg_mr(pd, addr, length, pmd_mr);
+}
+
+/**
+ * De-register mr.
+ *
+ * @param[in] pmd_mr
+ *  Pointer to PMD mr object
+ */
+void
+mlx5_os_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
+{
+   mlx5_common_verbs_dereg_mr(pmd_mr);
+}
diff --git a/drivers/common/mlx5/mlx5_common.h 
b/drivers/common/mlx5/mlx5_common.h
index fcdf376193..a87318db91 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -21,6 +21,7 @@
 
 #include "mlx5_prm.h"
 #include "mlx5_devx_cmds.h"
+#include "mlx5_common_mr.h"
 #include "mlx5_common_os.h"
 
 /* Reported driver name. */
@@ -427,4 +428,12 @@ __rte_internal
 int
 mlx5_os_get_pdn(void *pd, uint32_t *pdn);
 
+__rte_internal
+int
+mlx5_os_reg_mr(void *pd,
+  void *addr, size_t length, struct mlx5_pmd_mr *pmd_mr);
+__rte_internal
+void
+mlx5_os_dereg_mr(struct mlx5_pmd_mr *pmd_mr);
+
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/windows/mlx5_common_os.c 
b/drivers/common/mlx5/windows/mlx5_common_os.c
index 5c9cccd3e9..2ecdf78310 100644
--- a/drivers/common/mlx5/windows/mlx5_common_os.c
+++ b/drivers/common/mlx5/windows/mlx5_common_os.c
@@ -134,7 +134,7 @@ mlx5_os_umem_dereg(void *pumem)
 }
 
 /**
- * Register mr. Given protection doamin pointer, pointer to addr and length
+ * Register mr. Given protection domain pointer, pointer to addr and length
  * register the memory region.
  *
  * @param[in] pd
diff --git a/drivers/common/mlx5/windows/mlx5_common_os.h 
b/drivers/common/mlx5/windows/mlx5_common_os.h
index c3d74d3b67..62bdcb40cd 100644
--- a/drivers/common/mlx5/windows/mlx5_common_os.h
+++ b/drivers/common/mlx5/windows/mlx5_common_os.h
@@ -13,7 +13,6 @@
 #include "mlx5_autoconf.h"
 #include "mlx5_glue.h"
 #include "mlx5_malloc.h"
-#include "mlx5_common_mr.h"
 #include "mlx5_win_ext.h"
 
 #define MLX5_BF_OFFSET 0x800
@@ -256,11 +255,6 @@ __rte_internal
 void *mlx5_os_umem_reg(void *ctx, void *addr, size_t size, uint32_t access);
 __rte_internal
 int mlx5_os_umem_dereg(void *pumem);
-__rte_internal
-int mlx5_os_reg_mr(void *pd,
-  void *addr, size_t length, struct mlx5_pmd_mr *pmd_mr);
-__rte_internal
-void mlx5_os_dereg_mr(struct mlx5_pmd_mr *pmd_mr);
 int mlx5_os_match_devx_devices_to_addr(struct devx_device_bdf *devx_bdf,
struct rte_pci_addr *addr);
 #endif /* RTE_PMD_MLX5_COMMON_OS_H_ */
-- 
2.16.1.windows.4



[dpdk-dev] [RFC PATCH 05/10] crypto/mlx5: replace UNIX functions with EAL functions

2021-09-14 Thread Tal Shnaiderman
Use the OS agnostic EAL function rte_mem_page_size to get
page size value instead of the Linux specific implementation.

Also remove the usage of PTHREAD_MUTEX_INITIALIZER which is not
support in Windows and initialize priv_list_lock in RTE_INIT.

Signed-off-by: Tal Shnaiderman 
---
 drivers/crypto/mlx5/mlx5_crypto.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/mlx5/mlx5_crypto.c 
b/drivers/crypto/mlx5/mlx5_crypto.c
index b3d5200ca3..3dac69f860 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,7 +34,7 @@
 
 TAILQ_HEAD(mlx5_crypto_privs, mlx5_crypto_priv) mlx5_crypto_priv_list =
TAILQ_HEAD_INITIALIZER(mlx5_crypto_priv_list);
-static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
+static pthread_mutex_t priv_list_lock;
 
 int mlx5_crypto_logtype;
 
@@ -700,7 +701,7 @@ mlx5_crypto_queue_pair_setup(struct rte_cryptodev *dev, 
uint16_t qp_id,
attr.pd = priv->pdn;
attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar);
attr.cqn = qp->cq_obj.cq->id;
-   attr.log_page_size = rte_log2_u32(sysconf(_SC_PAGESIZE));
+   attr.log_page_size = rte_log2_u32(rte_mem_page_size());
attr.rq_size =  0;
attr.sq_size = RTE_BIT32(log_nb_desc);
attr.dbr_umem_valid = 1;
@@ -1134,6 +1135,7 @@ static struct mlx5_class_driver mlx5_crypto_driver = {
 
 RTE_INIT(rte_mlx5_crypto_init)
 {
+   pthread_mutex_init(&priv_list_lock, NULL);
mlx5_common_init();
if (mlx5_glue != NULL)
mlx5_class_driver_register(&mlx5_crypto_driver);
-- 
2.16.1.windows.4



[dpdk-dev] [RFC PATCH 10/10] crypto/mlx5: support on Windows

2021-09-14 Thread Tal Shnaiderman
Add support for mlx5 crypto pmd on Windows OS.

Signed-off-by: Tal Shnaiderman 
---
 drivers/common/mlx5/version.map  | 2 +-
 drivers/crypto/aesni_gcm/meson.build | 6 ++
 drivers/crypto/aesni_mb/meson.build  | 6 ++
 drivers/crypto/armv8/meson.build | 6 ++
 drivers/crypto/bcmfs/meson.build | 6 ++
 drivers/crypto/ccp/meson.build   | 1 +
 drivers/crypto/kasumi/meson.build| 6 ++
 drivers/crypto/meson.build   | 3 ---
 drivers/crypto/mlx5/meson.build  | 4 ++--
 drivers/crypto/mvsam/meson.build | 6 ++
 drivers/crypto/null/meson.build  | 6 ++
 drivers/crypto/octeontx/meson.build  | 6 ++
 drivers/crypto/openssl/meson.build   | 6 ++
 drivers/crypto/qat/meson.build   | 6 ++
 drivers/crypto/scheduler/meson.build | 6 ++
 drivers/crypto/snow3g/meson.build| 6 ++
 drivers/crypto/virtio/meson.build| 6 ++
 drivers/crypto/zuc/meson.build   | 6 ++
 18 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index c6de706fdb..f595ef30fb 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -17,7 +17,7 @@ INTERNAL {
mlx5_dev_is_pci;
mlx5_dev_to_pci_str;
 
-   mlx5_devx_alloc_uar; # WINDOWS_NO_EXPORT
+   mlx5_devx_alloc_uar;
 
mlx5_devx_cmd_alloc_pd;
mlx5_devx_cmd_create_conn_track_offload_obj;
diff --git a/drivers/crypto/aesni_gcm/meson.build 
b/drivers/crypto/aesni_gcm/meson.build
index 0fcac2a8eb..7d0140ff22 100644
--- a/drivers/crypto/aesni_gcm/meson.build
+++ b/drivers/crypto/aesni_gcm/meson.build
@@ -1,6 +1,12 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+if is_windows
+build = false
+reason = 'not supported on Windows'
+subdir_done()
+endif
+
 IMB_required_ver = '0.52.0'
 lib = cc.find_library('IPSec_MB', required: false)
 if not lib.found()
diff --git a/drivers/crypto/aesni_mb/meson.build 
b/drivers/crypto/aesni_mb/meson.build
index ed6b9f53e4..b7512383c3 100644
--- a/drivers/crypto/aesni_mb/meson.build
+++ b/drivers/crypto/aesni_mb/meson.build
@@ -1,6 +1,12 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+if is_windows
+build = false
+reason = 'not supported on Windows'
+subdir_done()
+endif
+
 IMB_required_ver = '0.52.0'
 lib = cc.find_library('IPSec_MB', required: false)
 if not lib.found()
diff --git a/drivers/crypto/armv8/meson.build b/drivers/crypto/armv8/meson.build
index 40a4dbb7bb..5effba8bbc 100644
--- a/drivers/crypto/armv8/meson.build
+++ b/drivers/crypto/armv8/meson.build
@@ -1,6 +1,12 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2019 Arm Limited
 
+if is_windows
+build = false
+reason = 'not supported on Windows'
+subdir_done()
+endif
+
 dep = dependency('libAArch64crypto', required: false, method: 'pkg-config')
 if not dep.found()
 build = false
diff --git a/drivers/crypto/bcmfs/meson.build b/drivers/crypto/bcmfs/meson.build
index d67e78d51b..5842f83a3b 100644
--- a/drivers/crypto/bcmfs/meson.build
+++ b/drivers/crypto/bcmfs/meson.build
@@ -3,6 +3,12 @@
 # All rights reserved.
 #
 
+if is_windows
+build = false
+reason = 'not supported on Windows'
+subdir_done()
+endif
+
 deps += ['eal', 'bus_vdev']
 sources = files(
 'bcmfs_logs.c',
diff --git a/drivers/crypto/ccp/meson.build b/drivers/crypto/ccp/meson.build
index 0f82b9b90b..a4f3406009 100644
--- a/drivers/crypto/ccp/meson.build
+++ b/drivers/crypto/ccp/meson.build
@@ -4,6 +4,7 @@
 if not is_linux
 build = false
 reason = 'only supported on Linux'
+subdir_done()
 endif
 dep = dependency('libcrypto', required: false, method: 'pkg-config')
 if not dep.found()
diff --git a/drivers/crypto/kasumi/meson.build 
b/drivers/crypto/kasumi/meson.build
index e6e0f08c3d..966b8a5214 100644
--- a/drivers/crypto/kasumi/meson.build
+++ b/drivers/crypto/kasumi/meson.build
@@ -1,6 +1,12 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018-2020 Intel Corporation
 
+if is_windows
+build = false
+reason = 'not supported on Windows'
+subdir_done()
+endif
+
 IMB_required_ver = '0.53.0'
 lib = cc.find_library('IPSec_MB', required: false)
 if not lib.found()
diff --git a/drivers/crypto/meson.build b/drivers/crypto/meson.build
index ea239f4c56..c49ec501d4 100644
--- a/drivers/crypto/meson.build
+++ b/drivers/crypto/meson.build
@@ -1,9 +1,6 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-if is_windows
-subdir_done()
-endif
 
 drivers = [
 'aesni_gcm',
diff --git a/drivers/crypto/mlx5/meson.build b/drivers/crypto/mlx5/meson.build
index 1d6e413dd5..9d9c9c00bc 100644
--- a/drivers/crypto/mlx5/meson.build
+++ b/drivers/crypto/mlx5/meson.build
@@ -1,9 +1,9 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright (c) 2021 NVIDIA Corporation & Affiliates
 
-if not is_linux
+if not (is_linux or is_windows)

[dpdk-dev] [RFC PATCH 03/10] common/mlx5: move pdn getter to common driver

2021-09-14 Thread Tal Shnaiderman
Move to common and export the function mlx5_os_get_pdn.

Signed-off-by: Tal Shnaiderman 
---
 drivers/common/mlx5/linux/mlx5_common_os.c   | 35 
 drivers/common/mlx5/mlx5_common.h|  4 
 drivers/common/mlx5/version.map  |  1 +
 drivers/common/mlx5/windows/mlx5_common_os.c | 21 +
 drivers/net/mlx5/linux/mlx5_os.c | 35 
 drivers/net/mlx5/mlx5.h  |  1 -
 drivers/net/mlx5/windows/mlx5_os.c   | 21 -
 7 files changed, 61 insertions(+), 57 deletions(-)

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c 
b/drivers/common/mlx5/linux/mlx5_common_os.c
index 3ef507944f..4aada82669 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -456,3 +456,38 @@ mlx5_os_open_device_context(struct rte_device *dev)
}
return ctx;
 }
+
+/**
+ * Extract pdn of PD object using DV API.
+ *
+ * @param[in] pd
+ *   Pointer to the verbs PD object.
+ * @param[out] pdn
+ *   Pointer to the PD object number variable.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+int
+mlx5_os_get_pdn(void *pd, uint32_t *pdn)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+   struct mlx5dv_obj obj;
+   struct mlx5dv_pd pd_info;
+   int ret = 0;
+
+   obj.pd.in = pd;
+   obj.pd.out = &pd_info;
+   ret = mlx5_glue->dv_init_obj(&obj, MLX5DV_OBJ_PD);
+   if (ret) {
+   DRV_LOG(DEBUG, "Fail to get PD object info");
+   return ret;
+   }
+   *pdn = pd_info.pdn;
+   return 0;
+#else
+   (void)pd;
+   (void)pdn;
+   return -ENOTSUP;
+#endif /* HAVE_IBV_FLOW_DV_SUPPORT */
+}
diff --git a/drivers/common/mlx5/mlx5_common.h 
b/drivers/common/mlx5/mlx5_common.h
index 249804b00c..fcdf376193 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -423,4 +423,8 @@ __rte_internal
 void *
 mlx5_os_open_device_context(struct rte_device *dev);
 
+__rte_internal
+int
+mlx5_os_get_pdn(void *pd, uint32_t *pdn);
+
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 6d4258dd25..c6de706fdb 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -144,6 +144,7 @@ INTERNAL {
mlx5_os_match_devx_devices_to_addr;
mlx5_os_open_device_context;
mlx5_os_get_ibv_dev; # WINDOWS_NO_EXPORT
+   mlx5_os_get_pdn;
mlx5_os_reg_mr;
mlx5_os_umem_dereg;
mlx5_os_umem_reg;
diff --git a/drivers/common/mlx5/windows/mlx5_common_os.c 
b/drivers/common/mlx5/windows/mlx5_common_os.c
index 3b59e57e57..5c9cccd3e9 100644
--- a/drivers/common/mlx5/windows/mlx5_common_os.c
+++ b/drivers/common/mlx5/windows/mlx5_common_os.c
@@ -323,3 +323,24 @@ mlx5_os_open_device_context(struct rte_device *dev)
mlx5_glue->free_device_list(orig_devx_bdf_devs);
return devx_ctx_match;
 }
+
+/**
+ * Extract pdn of PD object using DevX
+ *
+ * @param[in] pd
+ *   Pointer to the DevX PD object.
+ * @param[out] pdn
+ *   Pointer to the PD object number variable.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+int
+mlx5_os_get_pdn(void *pd, uint32_t *pdn)
+{
+   if (!pd)
+   return -EINVAL;
+
+   *pdn = ((struct mlx5_pd *)pd)->pdn;
+   return 0;
+}
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 470b16cb9a..a7df1ddb2e 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2811,41 +2811,6 @@ mlx5_restore_doorbell_mapping_env(int value)
setenv(MLX5_SHUT_UP_BF, value ? "1" : "0", 1);
 }
 
-/**
- * Extract pdn of PD object using DV API.
- *
- * @param[in] pd
- *   Pointer to the verbs PD object.
- * @param[out] pdn
- *   Pointer to the PD object number variable.
- *
- * @return
- *   0 on success, error value otherwise.
- */
-int
-mlx5_os_get_pdn(void *pd, uint32_t *pdn)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-   struct mlx5dv_obj obj;
-   struct mlx5dv_pd pd_info;
-   int ret = 0;
-
-   obj.pd.in = pd;
-   obj.pd.out = &pd_info;
-   ret = mlx5_glue->dv_init_obj(&obj, MLX5DV_OBJ_PD);
-   if (ret) {
-   DRV_LOG(DEBUG, "Fail to get PD object info");
-   return ret;
-   }
-   *pdn = pd_info.pdn;
-   return 0;
-#else
-   (void)pd;
-   (void)pdn;
-   return -ENOTSUP;
-#endif /* HAVE_IBV_FLOW_DV_SUPPORT */
-}
-
 /**
  * Function API to open IB device.
  *
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e02714e231..cb05929efe 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1770,7 +1770,6 @@ void mlx5_os_free_shared_dr(struct mlx5_priv *priv);
 int mlx5_os_open_device(const struct mlx5_dev_spawn_data *spawn,
 const struct mlx5_dev_config *config,
 struct mlx5_

[dpdk-dev] [RFC PATCH 07/10] crypto/mlx5: use OS agnostic functions for PD operations

2021-09-14 Thread Tal Shnaiderman
use the functions mlx5_os_alloc_pd, mlx5_os_dealloc_pd
mlx5_os_get_pdn instead of the glue functions to support
PD operations on all OSs.

Signed-off-by: Tal Shnaiderman 
---
 drivers/crypto/mlx5/mlx5_crypto.c | 15 ++-
 drivers/crypto/mlx5/mlx5_crypto.h |  2 +-
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/mlx5/mlx5_crypto.c 
b/drivers/crypto/mlx5/mlx5_crypto.c
index ccae113770..35319d0115 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -784,7 +784,7 @@ static void
 mlx5_crypto_hw_global_release(struct mlx5_crypto_priv *priv)
 {
if (priv->pd != NULL) {
-   claim_zero(mlx5_glue->dealloc_pd(priv->pd));
+   claim_zero(mlx5_os_dealloc_pd(priv->pd));
priv->pd = NULL;
}
if (priv->uar != NULL) {
@@ -801,21 +801,18 @@ mlx5_crypto_pd_create(struct mlx5_crypto_priv *priv)
struct mlx5dv_pd pd_info;
int ret;
 
-   priv->pd = mlx5_glue->alloc_pd(priv->ctx);
+   priv->pd = mlx5_os_alloc_pd(priv->ctx);
if (priv->pd == NULL) {
DRV_LOG(ERR, "Failed to allocate PD.");
return errno ? -errno : -ENOMEM;
}
-   obj.pd.in = priv->pd;
-   obj.pd.out = &pd_info;
-   ret = mlx5_glue->dv_init_obj(&obj, MLX5DV_OBJ_PD);
+   ret = mlx5_os_get_pdn(priv->pd, &priv->pdn);
if (ret != 0) {
-   DRV_LOG(ERR, "Fail to get PD object info.");
-   mlx5_glue->dealloc_pd(priv->pd);
+   DRV_LOG(ERR, "Fail to get PDN.");
+   mlx5_os_dealloc_pd(priv->pd);
priv->pd = NULL;
return -errno;
}
-   priv->pdn = pd_info.pdn;
return 0;
 #else
(void)priv;
@@ -834,7 +831,7 @@ mlx5_crypto_hw_global_prepare(struct mlx5_crypto_priv *priv)
priv->uar_addr = mlx5_os_get_devx_uar_reg_addr(priv->uar);
if (priv->uar == NULL || priv->uar_addr == NULL) {
rte_errno = errno;
-   claim_zero(mlx5_glue->dealloc_pd(priv->pd));
+   claim_zero(mlx5_os_dealloc_pd(priv->pd));
DRV_LOG(ERR, "Failed to allocate UAR.");
return -1;
}
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h 
b/drivers/crypto/mlx5/mlx5_crypto.h
index d5cc509e42..91e3f438b8 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -25,7 +25,7 @@ struct mlx5_crypto_priv {
volatile uint64_t *uar_addr;
uint32_t pdn; /* Protection Domain number. */
uint32_t max_segs_num; /* Maximum supported data segs. */
-   struct ibv_pd *pd;
+   void *pd;
struct mlx5_hlist *dek_hlist; /* Dek hash list. */
struct rte_cryptodev_config dev_config;
struct mlx5_mr_share_cache mr_scache; /* Global shared MR cache. */
-- 
2.16.1.windows.4



[dpdk-dev] [RFC PATCH 06/10] crypto/mlx5: use OS agnostic functions for UMEM operations

2021-09-14 Thread Tal Shnaiderman
use the functions mlx5_os_umem_reg, mlx5_os_umem_dereg
mlx5_os_get_umem_id instead of the glue functions to support
UMEM operations on all OSs.

Signed-off-by: Tal Shnaiderman 
---
 drivers/crypto/mlx5/mlx5_crypto.c | 14 +++---
 drivers/crypto/mlx5/mlx5_crypto.h |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/mlx5/mlx5_crypto.c 
b/drivers/crypto/mlx5/mlx5_crypto.c
index 3dac69f860..ccae113770 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -261,7 +261,7 @@ mlx5_crypto_queue_pair_release(struct rte_cryptodev *dev, 
uint16_t qp_id)
if (qp->qp_obj != NULL)
claim_zero(mlx5_devx_cmd_destroy(qp->qp_obj));
if (qp->umem_obj != NULL)
-   claim_zero(mlx5_glue->devx_umem_dereg(qp->umem_obj));
+   claim_zero(mlx5_os_umem_dereg(qp->umem_obj));
if (qp->umem_buf != NULL)
rte_free(qp->umem_buf);
mlx5_mr_btree_free(&qp->mr_ctrl.cache_bh);
@@ -682,10 +682,10 @@ mlx5_crypto_queue_pair_setup(struct rte_cryptodev *dev, 
uint16_t qp_id,
rte_errno = ENOMEM;
goto error;
}
-   qp->umem_obj = mlx5_glue->devx_umem_reg(priv->ctx,
-  (void *)(uintptr_t)qp->umem_buf,
-  umem_size,
-  IBV_ACCESS_LOCAL_WRITE);
+   qp->umem_obj = mlx5_os_umem_reg(priv->ctx,
+   (void *)(uintptr_t)qp->umem_buf,
+   umem_size,
+   IBV_ACCESS_LOCAL_WRITE);
if (qp->umem_obj == NULL) {
DRV_LOG(ERR, "Failed to register QP umem.");
goto error;
@@ -705,9 +705,9 @@ mlx5_crypto_queue_pair_setup(struct rte_cryptodev *dev, 
uint16_t qp_id,
attr.rq_size =  0;
attr.sq_size = RTE_BIT32(log_nb_desc);
attr.dbr_umem_valid = 1;
-   attr.wq_umem_id = qp->umem_obj->umem_id;
+   attr.wq_umem_id = mlx5_os_get_umem_id(qp->umem_obj);
attr.wq_umem_offset = 0;
-   attr.dbr_umem_id = qp->umem_obj->umem_id;
+   attr.dbr_umem_id = mlx5_os_get_umem_id(qp->umem_obj);
attr.dbr_address = RTE_BIT64(log_nb_desc) * priv->wqe_set_size;
qp->qp_obj = mlx5_devx_cmd_create_qp(priv->ctx, &attr);
if (qp->qp_obj == NULL) {
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h 
b/drivers/crypto/mlx5/mlx5_crypto.h
index d49b0001f0..d5cc509e42 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -45,7 +45,7 @@ struct mlx5_crypto_qp {
struct mlx5_devx_cq cq_obj;
struct mlx5_devx_obj *qp_obj;
struct rte_cryptodev_stats stats;
-   struct mlx5dv_devx_umem *umem_obj;
+   void *umem_obj;
void *umem_buf;
volatile uint32_t *db_rec;
struct rte_crypto_op **ops;
-- 
2.16.1.windows.4



[dpdk-dev] [RFC PATCH 09/10] crypto/mlx5: fix size of UMR WQE

2021-09-14 Thread Tal Shnaiderman
The size of the UMR WQE allocated object is decided by a sizof
operation on the struct, however since the struct contains
a union of flexible array members this sizeof results can differ
between compilers.

GCC for example treats the union as 0 sized, MSVC adds a padding
of 16Bits.

To resolve the ambiguity the allocation size will be calculated
by the sizes of the members excluding the flexible union.

Fixes: a1978aa23bf4 ("crypto/mlx5: add maximum segments configuration")
Cc: sta...@dpdk.org

Signed-off-by: Tal Shnaiderman 
---
 drivers/crypto/mlx5/mlx5_crypto.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/mlx5/mlx5_crypto.c 
b/drivers/crypto/mlx5/mlx5_crypto.c
index 3f5a6745dc..4b8d561e33 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -1061,7 +1061,9 @@ mlx5_crypto_dev_probe(struct rte_device *dev)
priv->keytag = rte_cpu_to_be_64(devarg_prms.keytag);
priv->max_segs_num = devarg_prms.max_segs_num;
priv->umr_wqe_size = sizeof(struct mlx5_wqe_umr_bsf_seg) +
-sizeof(struct mlx5_umr_wqe) +
+sizeof(struct mlx5_wqe_cseg) +
+sizeof(struct mlx5_wqe_umr_cseg) +
+sizeof(struct mlx5_wqe_mkey_cseg) +
 RTE_ALIGN(priv->max_segs_num, 4) *
 sizeof(struct mlx5_wqe_dseg);
rdmw_wqe_size = sizeof(struct mlx5_rdma_write_wqe) +
-- 
2.16.1.windows.4



Re: [dpdk-dev] [PATCH] net/virtio: report max/min/align desc limits in dev info get

2021-09-14 Thread Maxime Coquelin



On 8/20/21 2:48 PM, Andrew Rybchenko wrote:
> From: Ivan Ilchenko 
> 
> Report max/min/align descriptors limits in device info get callback.
> Before calling the callback, rte_eth_dev_info_get() provides
> default values of nb_min as zero and nb_max as UINT16_MAX that are
> not correct for the driver, so one can't rely on them.
> 
> Signed-off-by: Ivan Ilchenko 
> Signed-off-by: Andrew Rybchenko 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 25 +
>  1 file changed, 25 insertions(+)
> 

Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



[dpdk-dev] [RFC v2] vhost: add support async dequeue for packed ring

2021-09-14 Thread Cheng Jiang
This patch implements asynchronous dequeue data path for packed ring.

Signed-off-by: Cheng Jiang 
---
It's based on these 2 patches:
1. vhost: remove copy threshold for async vhost
http://patches.dpdk.org/project/dpdk/patch/1629463466-450012-1-git-send-email-jiayu...@intel.com/
2. vhost: support async dequeue for split ring
http://patches.dpdk.org/project/dpdk/patch/20210906204837.112466-2-wenwux...@intel.com/

v2:
 * fixed some issues

 lib/vhost/virtio_net.c | 325 ++---
 1 file changed, 302 insertions(+), 23 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index e0159b53e3..9a842ce8f4 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -1654,7 +1654,7 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev,
 }
 
 static __rte_always_inline void
-vhost_update_used_packed(struct vhost_virtqueue *vq,
+vhost_enqueue_update_used_packed(struct vhost_virtqueue *vq,
struct vring_used_elem_packed *shadow_ring,
uint16_t count)
 {
@@ -1970,22 +1970,66 @@ write_back_completed_descs_split(struct vhost_virtqueue 
*vq, uint16_t n_descs)
} while (nr_left > 0);
 }
 
+static __rte_always_inline void
+vhost_dequeue_update_used_packed(struct vhost_virtqueue *vq,
+   struct vring_used_elem_packed *shadow_ring,
+   uint16_t count)
+{
+   uint16_t i;
+   uint16_t flags;
+   uint16_t head_idx = vq->last_used_idx;
+   uint16_t head_flags = 0;
+
+   for (i = 0; i < count; i++)
+   vq->desc_packed[vq->last_used_idx + i].id = shadow_ring[i].id;
+
+   /* The ordering for storing desc flags needs to be enforced. */
+   rte_atomic_thread_fence(__ATOMIC_RELEASE);
+
+   for (i = 0; i < count; i++) {
+   flags = vq->desc_packed[vq->last_used_idx].flags;
+   if (vq->used_wrap_counter) {
+   flags |= VRING_DESC_F_USED;
+   flags |= VRING_DESC_F_AVAIL;
+   } else {
+   flags &= ~VRING_DESC_F_USED;
+   flags &= ~VRING_DESC_F_AVAIL;
+   }
+
+   if (i > 0)
+   vq->desc_packed[vq->last_used_idx].flags = flags;
+   else
+   head_flags = flags;
+
+   vq_inc_last_used_packed(vq, 1);
+   }
+
+   vq->desc_packed[head_idx].flags = head_flags;
+}
+
 static __rte_always_inline void
 write_back_completed_descs_packed(struct vhost_virtqueue *vq,
-   uint16_t n_buffers)
+   uint16_t n_buffers, bool is_txq)
 {
uint16_t nr_left = n_buffers;
uint16_t from, to;
+   void (*update_used_packed)(struct vhost_virtqueue *vq,
+   struct vring_used_elem_packed *shadow_ring, 
uint16_t count);
+
+   if (is_txq)
+   update_used_packed = vhost_enqueue_update_used_packed;
+   else
+   update_used_packed = vhost_dequeue_update_used_packed;
 
do {
from = vq->last_async_buffer_idx_packed;
to = (from + nr_left) % vq->size;
if (to > from) {
-   vhost_update_used_packed(vq, vq->async_buffers_packed + 
from, to - from);
+   update_used_packed(vq, vq->async_buffers_packed + from, 
to - from);
vq->last_async_buffer_idx_packed += nr_left;
nr_left = 0;
} else {
-   vhost_update_used_packed(vq, vq->async_buffers_packed + 
from,
+   update_used_packed(vq, vq->async_buffers_packed + from,
vq->size - from);
vq->last_async_buffer_idx_packed = 0;
nr_left -= vq->size - from;
@@ -2049,7 +2093,7 @@ vhost_poll_enqueue_completed(struct virtio_net *dev, 
uint16_t queue_id,
 
if (likely(vq->enabled && vq->access_ok)) {
if (vq_is_packed(dev)) {
-   write_back_completed_descs_packed(vq, n_buffers);
+   write_back_completed_descs_packed(vq, n_buffers, 1);
 
vhost_vring_call_packed(dev, vq);
} else {
@@ -3328,7 +3372,7 @@ async_desc_to_mbuf(struct virtio_net *dev,
 }
 
 static __rte_always_inline uint16_t
-async_poll_dequeue_completed_split(struct virtio_net *dev,
+async_poll_dequeue_completed(struct virtio_net *dev,
struct vhost_virtqueue *vq, uint16_t queue_id,
struct rte_mbuf **pkts, uint16_t count, bool legacy_ol_flags)
 {
@@ -3336,7 +3380,7 @@ async_poll_dequeue_completed_split(struct virtio_net *dev,
uint16_t start_idx, pkt_idx, from;
struct async_inflight_info *pkts_info;
 
-   pkt_idx = vq->async_pkts_idx & (vq->size - 1);
+   pkt_idx = vq->async_pkts_idx % vq->size;
pkts_info = vq->async_pkts

Re: [dpdk-dev] [PATCH v1 1/6] build: increase default of max lcores to 512

2021-09-14 Thread David Marchand
On Tue, Sep 14, 2021 at 11:34 AM David Hunt  wrote:
>
>
> On 10/9/2021 9:24 AM, Thomas Monjalon wrote:
> > 10/09/2021 10:06, David Marchand:
> >> On Fri, Sep 10, 2021 at 9:54 AM Bruce Richardson
> >>  wrote:
> >>> On Fri, Sep 10, 2021 at 08:51:04AM +0200, David Marchand wrote:
>  On Thu, Sep 9, 2021 at 4:38 PM Bruce Richardson
>   wrote:
> > On Thu, Sep 09, 2021 at 02:45:06PM +0100, David Hunt wrote:
> >> Modern processors are coming with an ever increasing number of cores,
> >> and 128 does not seem like a sensible max limit any more, especially
> >> when you consider multi-socket systems with Hyper-Threading enabled.
> >>
> >> This patch increases max_lcores default from 128 to 512.
> >>
> >> Signed-off-by: David Hunt 
>  Why should we need this?
> 
>  --lcores makes it possible to pin 128 lcores to any physical core on
>  your system.
>  And for applications that have their own thread management, they can
>  pin thread, then use rte_thread_register.
> 
>  Do you have applications that require more than 128 lcores?
> 
> >>> The trouble is that using the --lcores syntax for mapping high core 
> >>> numbers
> >>> to low lcore ids is much more awkward to use. Every case of DPDK use I've
> >>> seen uses -c with a coremask, or -l with just giving a few core numbers on
> >>> it. This simple scheme won't work with core numbers greater than 128, and
> >>> there are already systems available with more than that number of cores.
> >>>
> >>> Apart from the memory footprint issues - which this patch is already 
> >>> making
> >>> a good start in addressing, why would we not increase the default
> >>> max_lcores to that seen on real systems?
> >> The memory footprint is a major issue to me, and reserving all those
> >> lcores won't be needed in any system.
> >> We will also have to decide on a "640k ought to be enough" value to
> >> avoid ABI issue with the next processor that comes out and has more
> >> than 512 cores.
> >>
> >> Could we wire the -c / -l options to --lcores behavior ?
> >> It breaks the 1:1 lcore/physical core assumption, but it solves your
> >> usability issue.
> > Why would we change existing options while we already have an option
> > (--lcores) which solves the issue above?
> > I think the only issue is to educate users.
> > Is there something to improve in the documentation?
> >
>
> Hi all,
> I agree that it’s a good idea to switch to using the “--lcrores” option

Let's avoid typo in the error message you'll add :-).


> for cores above the default, that’s already future proofed.
> However, I’m still a little concerned about usability, if our users are
> accustomed to the “-c” and “-l” options, I suggest that we add a warning
> to suggest using the “--lcores” option if any of the cores provided on
> the command line are above RTE_MAX_LCORE. That would help them with the
> solution to using physical cores above 128 (or whatever the compiled
> default is).
>
> Example:
>
> “ERROR: logical core 212 is above the maximum lcore number permitted.
> Please use the --lcores option to map lcores onto physical cores, e.g.
> --lcores="(0-3)@(212-215).”

If you could directly provide the right --lcores syntax based on what
user provided with -c or -l, it would be even better.
This should be not that difficult.


>
> I’ll replace the first patch in the set with a patch that adds the
> additional information in the error message.



-- 
David Marchand



Re: [dpdk-dev] [PATCH v2] config/ppc: ignore gcc 11 psabi warnings

2021-09-14 Thread David Marchand
On Tue, Sep 14, 2021 at 11:18 AM Ferruh Yigit  wrote:
>
> On 9/3/2021 12:53 AM, David Christensen wrote:
> > Suppress the gcc warning "note: the layout of aggregates containing
> > vectors with 4-byte alignment has changed in GCC 5" on POWER systems
> > by setting "-Wno-psabi".  Warning was originally added to gcc in
> > commit https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=9832651 to warn
> > of the vector alignment changes introduced in GCC 5.  Older gcc
> > versions forced vector alignment to 16 bytes due to requirements for
> > POWER 6 and earlier CPUs, but these restrictions don't apply to CPUs
> > supported by DPDK.
> >
> > Bugzilla ID: 739
> >
> > Signed-off-by: David Christensen 
> > ---
> > v2:
> > - update copyright year
> > - rebase for 21.11-rc0
> > ---
> >  config/ppc/meson.build | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/config/ppc/meson.build b/config/ppc/meson.build
> > index adf49e1f42..5354db4e0a 100644
> > --- a/config/ppc/meson.build
> > +++ b/config/ppc/meson.build
> > @@ -1,5 +1,6 @@
> >  # SPDX-License-Identifier: BSD-3-Clause
> >  # Copyright(c) 2018 Luca Boccassi 
> > +# Copyright(c) 2021 IBM Corporation
> >
> >  if not dpdk_conf.get('RTE_ARCH_64')
> >  error('Only 64-bit compiles are supported for this platform type')
> > @@ -17,6 +18,12 @@ if not power9_supported
> >  dpdk_conf.set('RTE_MACHINE','power8')
> >  endif
> >
> > +# Suppress the gcc warning "note: the layout of aggregates containing
> > +# vectors with 4-byte alignment has changed in GCC 5".
> > +if cc.get_id() == 'gcc' and cc.version().version_compare('>=10.0') and 
> > cc.version().version_compare('<12.0') and cc.has_argument('-Wno-psabi')
> > +add_project_arguments('-Wno-psabi', language: 'c')
> > +endif
> > +
> >  # Certain POWER9 systems can scale as high as 1536 LCORES, but setting 
> > such a
> >  # high value can waste memory, cause timeouts in time limited autotests, 
> > and is
> >  # unlikely to be used in many production situations.  Similarly, keeping 
> > the
> > --
>
> I am getting following build error in my environment:
> "config/ppc/meson.build:23:6: ERROR: Unknown statement."
>
> The compiler I have is:
> powerpc64le-linux-gcc (gcc 10.2.0 "powerpc64le-linux-gcc.br_real (Buildroot
> 2020.08-14-ge5a2a90) 10.2.0")
>
> meson version: Version: 0.59.1

Good catch.
My fault, and I did not see it because I was still testing with gcc 9.


>
> Multi-line statements seems need to be merged with '\':
>
> diff --git a/config/ppc/meson.build b/config/ppc/meson.build
> index 0b1948fc7cb9..f95009c77e7a 100644
> --- a/config/ppc/meson.build
> +++ b/config/ppc/meson.build
> @@ -20,7 +20,7 @@ endif
>
>  # Suppress the gcc warning "note: the layout of aggregates containing
>  # vectors with 4-byte alignment has changed in GCC 5".
> -if cc.get_id() == 'gcc' and cc.version().version_compare('>=10.0') and
> +if cc.get_id() == 'gcc' and cc.version().version_compare('>=10.0') and \
>  cc.version().version_compare('<12.0') and 
> cc.has_argument('-Wno-psabi')

I based this update of mine on other existing multiline statements in dpdk.
But I did not notice that all of them are within parens.

So both \ and () are fine.

Bruce, I did not see this described in our meson coding style.
Do you have an opinion for multiline statements in meson?


-- 
David Marchand



Re: [dpdk-dev] [PATCH v2] config/ppc: ignore gcc 11 psabi warnings

2021-09-14 Thread Bruce Richardson
On Tue, Sep 14, 2021 at 12:16:36PM +0200, David Marchand wrote:
> On Tue, Sep 14, 2021 at 11:18 AM Ferruh Yigit  wrote:
> >
> > On 9/3/2021 12:53 AM, David Christensen wrote:
> > > Suppress the gcc warning "note: the layout of aggregates containing
> > > vectors with 4-byte alignment has changed in GCC 5" on POWER systems
> > > by setting "-Wno-psabi".  Warning was originally added to gcc in
> > > commit https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=9832651 to warn
> > > of the vector alignment changes introduced in GCC 5.  Older gcc
> > > versions forced vector alignment to 16 bytes due to requirements for
> > > POWER 6 and earlier CPUs, but these restrictions don't apply to CPUs
> > > supported by DPDK.
> > >
> > > Bugzilla ID: 739
> > >
> > > Signed-off-by: David Christensen 
> > > ---
> > > v2:
> > > - update copyright year
> > > - rebase for 21.11-rc0
> > > ---
> > >  config/ppc/meson.build | 7 +++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/config/ppc/meson.build b/config/ppc/meson.build
> > > index adf49e1f42..5354db4e0a 100644
> > > --- a/config/ppc/meson.build
> > > +++ b/config/ppc/meson.build
> > > @@ -1,5 +1,6 @@
> > >  # SPDX-License-Identifier: BSD-3-Clause
> > >  # Copyright(c) 2018 Luca Boccassi 
> > > +# Copyright(c) 2021 IBM Corporation
> > >
> > >  if not dpdk_conf.get('RTE_ARCH_64')
> > >  error('Only 64-bit compiles are supported for this platform type')
> > > @@ -17,6 +18,12 @@ if not power9_supported
> > >  dpdk_conf.set('RTE_MACHINE','power8')
> > >  endif
> > >
> > > +# Suppress the gcc warning "note: the layout of aggregates containing
> > > +# vectors with 4-byte alignment has changed in GCC 5".
> > > +if cc.get_id() == 'gcc' and cc.version().version_compare('>=10.0') and 
> > > cc.version().version_compare('<12.0') and cc.has_argument('-Wno-psabi')
> > > +add_project_arguments('-Wno-psabi', language: 'c')
> > > +endif
> > > +
> > >  # Certain POWER9 systems can scale as high as 1536 LCORES, but setting 
> > > such a
> > >  # high value can waste memory, cause timeouts in time limited autotests, 
> > > and is
> > >  # unlikely to be used in many production situations.  Similarly, keeping 
> > > the
> > > --
> >
> > I am getting following build error in my environment:
> > "config/ppc/meson.build:23:6: ERROR: Unknown statement."
> >
> > The compiler I have is:
> > powerpc64le-linux-gcc (gcc 10.2.0 "powerpc64le-linux-gcc.br_real (Buildroot
> > 2020.08-14-ge5a2a90) 10.2.0")
> >
> > meson version: Version: 0.59.1
> 
> Good catch.
> My fault, and I did not see it because I was still testing with gcc 9.
> 
> 
> >
> > Multi-line statements seems need to be merged with '\':
> >
> > diff --git a/config/ppc/meson.build b/config/ppc/meson.build
> > index 0b1948fc7cb9..f95009c77e7a 100644
> > --- a/config/ppc/meson.build
> > +++ b/config/ppc/meson.build
> > @@ -20,7 +20,7 @@ endif
> >
> >  # Suppress the gcc warning "note: the layout of aggregates containing
> >  # vectors with 4-byte alignment has changed in GCC 5".
> > -if cc.get_id() == 'gcc' and cc.version().version_compare('>=10.0') and
> > +if cc.get_id() == 'gcc' and cc.version().version_compare('>=10.0') and \
> >  cc.version().version_compare('<12.0') and 
> > cc.has_argument('-Wno-psabi')
> 
> I based this update of mine on other existing multiline statements in dpdk.
> But I did not notice that all of them are within parens.
> 
> So both \ and () are fine.
> 
> Bruce, I did not see this described in our meson coding style.
> Do you have an opinion for multiline statements in meson?
> 

No, I don't have an opinion either way. If you want, we can pick one to use
in the coding style, but if we do we should also match the style for python
code, as I believe they both share this multi-line approach.


Re: [dpdk-dev] [RFC] Control packet event adapter and FIFO library

2021-09-14 Thread Kaladi, Ashok K
Hi Jerin,


> -Original Message-
> From: Jerin Jacob 
> Sent: Thursday, September 2, 2021 12:20 PM
> To: Kaladi, Ashok K 
> Cc: Harman Kalra ; Nithin Dabilpuram
> ; Yigit, Ferruh ;
> Burakov, Anatoly ; Richardson, Bruce
> ; Ananyev, Konstantin
> ; Thomas Monjalon
> ; David Marchand ;
> jer...@marvell.com; Jayatheerthan, Jay ;
> Carrillo, Erik G ; Gujjar, Abhinandan S
> ; dev@dpdk.org; Ayyadurai, Balasankar
> ; Jakub Grajciar ;
> mattias.ronnblom 
> Subject: Re: [dpdk-dev] [RFC] Control packet event adapter and FIFO library
> 
> On Thu, Sep 2, 2021 at 10:02 AM Kaladi, Ashok K 
> wrote:
> >
> > Hi Jerin,
> 
> Hi Ashok,
> 
> >
> >
> > -Original Message-
> > From: Jerin Jacob 
> > Sent: Wednesday, September 1, 2021 1:20 PM
> > To: Kaladi, Ashok K ; Harman Kalra
> > ; Nithin Dabilpuram ;
> > Yigit, Ferruh ; Burakov, Anatoly
> > ; Richardson, Bruce
> > ; Ananyev, Konstantin
> > ; Thomas Monjalon
> ;
> > David Marchand 
> > Cc: jer...@marvell.com; Jayatheerthan, Jay
> > ; Carrillo, Erik G
> > ; Gujjar, Abhinandan S
> > ; dev@dpdk.org; Ayyadurai, Balasankar
> > 
> > Subject: Re: [dpdk-dev] [RFC] Control packet event adapter and FIFO
> > library
> >
> > On Wed, Sep 1, 2021 at 11:55 AM Jerin Jacob 
> wrote:
> > >
> > > On Wed, Sep 1, 2021 at 11:12 AM Kaladi, Ashok K
> > >  wrote:
> > > >
> > > > Dear dpdk-dev team,
> > > >
> > > > We would like to propose the following RFC for your review.
> > > >
> > > > A user space application may need access to the packets handled by
> > > > eventdev based DPDK application. This application doesn't use mbuf
> > > > or eventdev based DPDK APIs. Presently this is not possible
> > > > without passing packets through DPDK KNI.
> > >
> > >
> > > I think it is an innovative idea it is useful for multiple use cases
> > > not just for eventdev.
> > >
> > > Some feedback on thoughts
> > >
> > > 1) The FIFO library should be generic it should not be specific to
> > > eventdev
> >
> > Agreed, it's planned to be generic.
> >
> > > 2) I think,  This FIFO library should be generic and  KNI also be a
> > > consumer of this library
> >
> > Agreed,  any adaptation needed in KNI can be taken up later.
> >
> > > 3) I think, FIFO should not be a device instead it should be an
> > > abstact object like rte_mempool *
> >
> > FIFO is comparable to queue. We will have a data structure which contains
> address of Rx, Tx, Alloc & Free FIFOs, number of queues etc.
> > This can be used to create a device. This method is similar to KNI -  struct
> kni_dev.
> >
> > > 4) We need to consider User space app can be another DPDK primary
> > > process or some non DPDK app
> >
> > Agreed
> >
> > > 4) I think, we can remove the Linux shared memory dependency instead
> > > of introduce some scheme of "exporting" memzone from DPDK
> > > application to another user space app or another DPDK primary process.
> > > I see the following reasons:
> > > - It is backed by hugepage so better performance
> > > - Is kernel do any memcpy when using Linux shm calls in kernel space?
> >
> > We are proposing to use POSIX complaint APIs shm_open(), mmap() APIs
> to create shared memory to avoid dependency on Linux.
> > The shared memory is created in Hugepages and contains mempool and
> mbufs. This is done by control packet adapter.
> > This avoids application to be aware of these DPDK constructs. It just needs
> to know about the simplified format defined by FIFO lib.
> > Proposed use case is for user space application which doesn’t need
> memcpy as mempool is in shared memory.
> > For Kernel application we may use similar approach as in KNI. This can be
> taken up later.
> 
> + memif maintainer ( jgraj...@cisco.com )
> 
> I just looked memif, based on a suggestion from @Mattias Rönnblom
> 
> Looks like memif is already solved this problem in a clean way and DPDK has
> support for the same as ethdev driver.
> I think, it has only a downside that it has Linux OS dependency due to
> memfd_create(). Any other downside for memif?
> I think, may be,  we need to weigh in pros and cons of memif vs new
> proposing library. Could you check the same?
> 
[Ashok Kaladi] 

Thanks for pointing it out. 
Checked the memif implementation and see that it doesn't have any other notable 
downsides than Linux dependency.
So we are withdrawing this RFC.

Regards
Ashok

> >
> > >
> > > Thoughts?
> > >
> > > May be you can share set of API prototypes without any
> > > implementation for the next level discussion if others are OK this kind of
> library.
> >
> >


Re: [dpdk-dev] [PATCH] efd: change data type of parameter

2021-09-14 Thread Kinsella, Ray



On 14/09/2021 08:10, David Marchand wrote:
> On Fri, Sep 10, 2021 at 6:54 PM Pablo de Lara
>  wrote:
>>
>> rte_efd_create() function was using uint8_t for a socket bitmask,
>> for one of its parameters.
>> This limits the maximum of NUMA sockets to be 8.
>> Changing to to uint64_t increases it to 64, which should be
>> more future-proof.
> 
> Cc: ppc maintainer, since I think powerX servers have non contiguous
> NUMA sockets.
> 
> 
>>
>> Coverity issue: 366390
>> Fixes: 56b6ef874f8 ("efd: new Elastic Flow Distributor library")
>>
>> Signed-off-by: Pablo de Lara 
>> ---
>>
>> This fix requires an API breakage and therefore it is not
>> a good candidate for backporting (besides, it is a very low impact bug).
>> Hence, I am not CC'ing stable.
> 
> This is an unannounced breakage for a stable API.
> Cc: techboard + Ray for awareness.

Understood.
Its low impact, at a time we are changing the ABI in any case.

> 
> 
>>
>> ---
>>
>>  lib/efd/rte_efd.c | 2 +-
>>  lib/efd/rte_efd.h | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/efd/rte_efd.c b/lib/efd/rte_efd.c
>> index 77f46809f8..68a2378e88 100644
>> --- a/lib/efd/rte_efd.c
>> +++ b/lib/efd/rte_efd.c
>> @@ -495,7 +495,7 @@ efd_search_hash(struct rte_efd_table * const table,
>>
>>  struct rte_efd_table *
>>  rte_efd_create(const char *name, uint32_t max_num_rules, uint32_t key_len,
>> -   uint8_t online_cpu_socket_bitmask, uint8_t 
>> offline_cpu_socket)
>> +   uint64_t online_cpu_socket_bitmask, uint8_t 
>> offline_cpu_socket)
>>  {
>> struct rte_efd_table *table = NULL;
>> uint8_t *key_array = NULL;
>> diff --git a/lib/efd/rte_efd.h b/lib/efd/rte_efd.h
>> index c2be4c09ae..d3d7befd0c 100644
>> --- a/lib/efd/rte_efd.h
>> +++ b/lib/efd/rte_efd.h
>> @@ -139,7 +139,7 @@ typedef uint16_t efd_hashfunc_t;
>>   */
>>  struct rte_efd_table *
>>  rte_efd_create(const char *name, uint32_t max_num_rules, uint32_t key_len,
>> -   uint8_t online_cpu_socket_bitmask, uint8_t offline_cpu_socket);
>> +   uint64_t online_cpu_socket_bitmask, uint8_t offline_cpu_socket);
>>
>>  /**
>>   * Releases the resources from an EFD table
>> --
>> 2.25.1
>>
> 
> 


Re: [dpdk-dev] [PATCH] net/virtio: fix device configure without jumbo Rx offload

2021-09-14 Thread Maxime Coquelin



On 9/2/21 4:39 PM, Andrew Rybchenko wrote:
> From: Ivan Ilchenko 
> 
> Use max-pkt-len only if jumbo frames offload is requested
> since otherwise this field isn't valid.
> 
> Fixes: 8b90e4358112 ("net/virtio: set offload flag for jumbo frames")
> Fixes: 4e8169eb0d2d ("net/virtio: fix Rx scatter offload")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ivan Ilchenko 
> Signed-off-by: Andrew Rybchenko 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index e58085a2c9..9bce6833db 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -2099,10 +2099,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>   return ret;
>   }
>  
> - if (rxmode->max_rx_pkt_len > hw->max_mtu + ether_hdr_len)
> + if ((rx_offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) &&
> + (rxmode->max_rx_pkt_len > hw->max_mtu + ether_hdr_len))
>   req_features &= ~(1ULL << VIRTIO_NET_F_MTU);
>  
> - hw->max_rx_pkt_len = rxmode->max_rx_pkt_len;
> + if (rx_offloads & DEV_RX_OFFLOAD_JUMBO_FRAME)
> + hw->max_rx_pkt_len = rxmode->max_rx_pkt_len;
> + else
> + hw->max_rx_pkt_len = ether_hdr_len + dev->data->mtu;
>  
>   if (rx_offloads & (DEV_RX_OFFLOAD_UDP_CKSUM |
>  DEV_RX_OFFLOAD_TCP_CKSUM))
> 

Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



Re: [dpdk-dev] [PATCH v1 1/6] build: increase default of max lcores to 512

2021-09-14 Thread David Hunt



On 14/9/2021 11:00 AM, David Marchand wrote:

On Tue, Sep 14, 2021 at 11:34 AM David Hunt  wrote:


On 10/9/2021 9:24 AM, Thomas Monjalon wrote:

10/09/2021 10:06, David Marchand:

On Fri, Sep 10, 2021 at 9:54 AM Bruce Richardson
 wrote:

On Fri, Sep 10, 2021 at 08:51:04AM +0200, David Marchand wrote:

On Thu, Sep 9, 2021 at 4:38 PM Bruce Richardson
 wrote:

On Thu, Sep 09, 2021 at 02:45:06PM +0100, David Hunt wrote:

Modern processors are coming with an ever increasing number of cores,
and 128 does not seem like a sensible max limit any more, especially
when you consider multi-socket systems with Hyper-Threading enabled.

This patch increases max_lcores default from 128 to 512.

Signed-off-by: David Hunt 

Why should we need this?

--lcores makes it possible to pin 128 lcores to any physical core on
your system.
And for applications that have their own thread management, they can
pin thread, then use rte_thread_register.

Do you have applications that require more than 128 lcores?


The trouble is that using the --lcores syntax for mapping high core numbers
to low lcore ids is much more awkward to use. Every case of DPDK use I've
seen uses -c with a coremask, or -l with just giving a few core numbers on
it. This simple scheme won't work with core numbers greater than 128, and
there are already systems available with more than that number of cores.

Apart from the memory footprint issues - which this patch is already making
a good start in addressing, why would we not increase the default
max_lcores to that seen on real systems?

The memory footprint is a major issue to me, and reserving all those
lcores won't be needed in any system.
We will also have to decide on a "640k ought to be enough" value to
avoid ABI issue with the next processor that comes out and has more
than 512 cores.

Could we wire the -c / -l options to --lcores behavior ?
It breaks the 1:1 lcore/physical core assumption, but it solves your
usability issue.

Why would we change existing options while we already have an option
(--lcores) which solves the issue above?
I think the only issue is to educate users.
Is there something to improve in the documentation?


Hi all,
I agree that it’s a good idea to switch to using the “--lcrores” option

Let's avoid typo in the error message you'll add :-).



for cores above the default, that’s already future proofed.
However, I’m still a little concerned about usability, if our users are
accustomed to the “-c” and “-l” options, I suggest that we add a warning
to suggest using the “--lcores” option if any of the cores provided on
the command line are above RTE_MAX_LCORE. That would help them with the
solution to using physical cores above 128 (or whatever the compiled
default is).

Example:

“ERROR: logical core 212 is above the maximum lcore number permitted.
Please use the --lcores option to map lcores onto physical cores, e.g.
--lcores="(0-3)@(212-215).”

If you could directly provide the right --lcores syntax based on what
user provided with -c or -l, it would be even better.
This should be not that difficult.



Agreed. I now have something working that when given "-l 12-16,130,132", 
will output the following:


EAL: One of the 7 cores provided exceeds RTE_MAX_LCORE (128)
EAL: Please use --lcores instead, e.g. --lcores "(0-6)@(12-16,130,132)"

So you can just cut-and-paste that option into your command line. Makes 
it very easy for users to migrate.






I’ll replace the first patch in the set with a patch that adds the
additional information in the error message.





Re: [dpdk-dev] [PATCH] net/virtio: fix device configure without jumbo Rx offload

2021-09-14 Thread Andrew Rybchenko
On 9/14/21 2:07 PM, Maxime Coquelin wrote:
> 
> 
> On 9/2/21 4:39 PM, Andrew Rybchenko wrote:
>> From: Ivan Ilchenko 
>>
>> Use max-pkt-len only if jumbo frames offload is requested
>> since otherwise this field isn't valid.
>>
>> Fixes: 8b90e4358112 ("net/virtio: set offload flag for jumbo frames")
>> Fixes: 4e8169eb0d2d ("net/virtio: fix Rx scatter offload")
>> Cc: sta...@dpdk.org
>>
>> Signed-off-by: Ivan Ilchenko 
>> Signed-off-by: Andrew Rybchenko 
>> ---
>>  drivers/net/virtio/virtio_ethdev.c | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio/virtio_ethdev.c 
>> b/drivers/net/virtio/virtio_ethdev.c
>> index e58085a2c9..9bce6833db 100644
>> --- a/drivers/net/virtio/virtio_ethdev.c
>> +++ b/drivers/net/virtio/virtio_ethdev.c
>> @@ -2099,10 +2099,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>>  return ret;
>>  }
>>  
>> -if (rxmode->max_rx_pkt_len > hw->max_mtu + ether_hdr_len)
>> +if ((rx_offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) &&
>> +(rxmode->max_rx_pkt_len > hw->max_mtu + ether_hdr_len))
>>  req_features &= ~(1ULL << VIRTIO_NET_F_MTU);
>>  
>> -hw->max_rx_pkt_len = rxmode->max_rx_pkt_len;
>> +if (rx_offloads & DEV_RX_OFFLOAD_JUMBO_FRAME)
>> +hw->max_rx_pkt_len = rxmode->max_rx_pkt_len;
>> +else
>> +hw->max_rx_pkt_len = ether_hdr_len + dev->data->mtu;
>>  
>>  if (rx_offloads & (DEV_RX_OFFLOAD_UDP_CKSUM |
>> DEV_RX_OFFLOAD_TCP_CKSUM))
>>
> 
> Reviewed-by: Maxime Coquelin 

Maxime, please, make letters lower case in From E-mail on
applying.


Re: [dpdk-dev] [PATCH] vhost: remove copy threshold for async vhost

2021-09-14 Thread Maxime Coquelin



On 8/20/21 2:44 PM, Jiayu Hu wrote:
> Copy threshold is introduced in async vhost data path to select
> the appropriate copy engine to do copies for higher efficiency.
> However, it may cause packets out-of-order, and it also causes
> data path performance unpredictable.
> 
> Therefore, this patch removes copy threshold support in async vhost
> data path.
> 
> Signed-off-by: Jiayu Hu 
> Signed-off-by: Cheng Jiang 
> ---
>  doc/guides/prog_guide/vhost_lib.rst |   7 -
>  examples/vhost/main.c   |  22 +-
>  lib/vhost/rte_vhost_async.h |  22 +-
>  lib/vhost/vhost.c   |   6 +-
>  lib/vhost/vhost.h   |   1 -
>  lib/vhost/virtio_net.c  | 439 
> +---
>  6 files changed, 116 insertions(+), 381 deletions(-)
> 

Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH] net/virtio: remove handling of zero desc number on RxQ setup

2021-09-14 Thread Maxime Coquelin



On 8/20/21 2:47 PM, Andrew Rybchenko wrote:
> From: Ivan Ilchenko 
> 
> Rx queue setup callback allows to use the whole ring when
> descriptor number argument equals zero. There's no point to
> handle zero in any way since RTE Rx queue setup function
> rte_eth_rx_queue_setup() doesn't pass zero using fallback
> values.
> 
> Fixes: 3be82f5cc5e3 ("ethdev: support PMD-tuned Tx/Rx parameters")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ivan Ilchenko 
> Signed-off-by: Andrew Rybchenko 
> ---
>  drivers/net/virtio/virtio_rxtx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH] net/virtio: fix reporting of mbufs allocated on RxQ setup

2021-09-14 Thread Maxime Coquelin



On 8/20/21 2:47 PM, Andrew Rybchenko wrote:
> From: Ivan Ilchenko 
> 
> Rx queue setup finish function may report wrong number of
> allocated mbufs in case of in-order feature. Fix the
> function to not ignore allocation error and count only
> successfully allocated number of buffers.
> 
> Fixes: e5f456a98d3 ("net/virtio: support in-order Rx and Tx")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ivan Ilchenko 
> Signed-off-by: Andrew Rybchenko 
> ---
>  drivers/net/virtio/virtio_rxtx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 

Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH] net/virtio: turn SW RxQ size to that of split vec. virtqueue

2021-09-14 Thread Maxime Coquelin



On 8/20/21 2:47 PM, Andrew Rybchenko wrote:
> From: Ivan Ilchenko 
> 
> Descriptors number may be set less than queue size for split queue
> vectorized Rx path. Pointers to mbufs for received packets are
> obtained from SW ring, that is initially filled with them in the end
> of queue setup in virtio_dev_rx_queue_setup_finish(). The begin of the
> SW ring filled up to the size of descriptors number. At queue size
> offset from the begin of the SW ring pointers to some fake mbuf are also
> set for wrapping purpose. So the ring may contains the hole of invalid
> pointers from descriptors number offset to queue size offset, and split
> vectorized Rx routines could write to the invalid addresses since they
> use the ring up to the queue size. Fix this by setting descriptors
> number to queue size on Rx queue setup.
> 
> Fixes: fc3d66212fed ("virtio: add vector Rx")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ivan Ilchenko 
> Signed-off-by: Andrew Rybchenko 
> ---
>  drivers/net/virtio/virtio_rxtx.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 

Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH] net/virtio: report max/min/align desc limits in dev info get

2021-09-14 Thread Maxime Coquelin



On 8/20/21 2:48 PM, Andrew Rybchenko wrote:
> From: Ivan Ilchenko 
> 
> Report max/min/align descriptors limits in device info get callback.
> Before calling the callback, rte_eth_dev_info_get() provides
> default values of nb_min as zero and nb_max as UINT16_MAX that are
> not correct for the driver, so one can't rely on them.
> 
> Signed-off-by: Ivan Ilchenko 
> Signed-off-by: Andrew Rybchenko 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 25 +
>  1 file changed, 25 insertions(+)

Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH v4] net/virtio: fix repeated memory free of vq

2021-09-14 Thread Maxime Coquelin



On 8/31/21 4:39 PM, Gaoxiang Liu wrote:
> When virtio_init_queue returns error, the memory of vq is freed.
> But the value of hw->vqs[queue_idx] does not restore.
> If virtio_init_queue returns error, the memory of vq is freed again
> in virtio_free_queues.
> 
> Fixes: 69c80d4ef89b ("net/virtio: allocate queue at init stage")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Gaoxiang Liu 
> ---
> 
> v2:
> * Fix spelling warning
> 
> v3:
> * Add detailed log
> 
> v4:
> * Update the email address
> ---
>  drivers/net/virtio/virtio_ethdev.c | 1 +
>  1 file changed, 1 insertion(+)
> 


Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH v2 0/2] virtio-user interrupt fixes

2021-09-14 Thread Maxime Coquelin



On 8/31/21 5:54 PM, David Marchand wrote:
> Trying to use virtio-user as a replacement for taps in OVS, I ended up
> with some fixes on the interrupt side.
> The patches for OVS are not ready yet, but sending the DPDK fixes in
> any case.
> 
> @CI guys:
> Patch 2 probably means there is a hole in the virtio-user interrupt
> test plan in DTS (I could not find a setup with
> virtio-user+rx interrupts+multi queue).
> 
> 

Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH] net/virtio: fix device configure without jumbo Rx offload

2021-09-14 Thread Maxime Coquelin



On 9/2/21 4:39 PM, Andrew Rybchenko wrote:
> From: Ivan Ilchenko 
> 
> Use max-pkt-len only if jumbo frames offload is requested
> since otherwise this field isn't valid.
> 
> Fixes: 8b90e4358112 ("net/virtio: set offload flag for jumbo frames")
> Fixes: 4e8169eb0d2d ("net/virtio: fix Rx scatter offload")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ivan Ilchenko 
> Signed-off-by: Andrew Rybchenko 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 

Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH] net/virtio: fix device configure without jumbo Rx offload

2021-09-14 Thread Maxime Coquelin



On 9/14/21 1:17 PM, Andrew Rybchenko wrote:
> On 9/14/21 2:07 PM, Maxime Coquelin wrote:
>>
>>
>> On 9/2/21 4:39 PM, Andrew Rybchenko wrote:
>>> From: Ivan Ilchenko 
>>>
>>> Use max-pkt-len only if jumbo frames offload is requested
>>> since otherwise this field isn't valid.
>>>
>>> Fixes: 8b90e4358112 ("net/virtio: set offload flag for jumbo frames")
>>> Fixes: 4e8169eb0d2d ("net/virtio: fix Rx scatter offload")
>>> Cc: sta...@dpdk.org
>>>
>>> Signed-off-by: Ivan Ilchenko 
>>> Signed-off-by: Andrew Rybchenko 
>>> ---
>>>  drivers/net/virtio/virtio_ethdev.c | 8 ++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio/virtio_ethdev.c 
>>> b/drivers/net/virtio/virtio_ethdev.c
>>> index e58085a2c9..9bce6833db 100644
>>> --- a/drivers/net/virtio/virtio_ethdev.c
>>> +++ b/drivers/net/virtio/virtio_ethdev.c
>>> @@ -2099,10 +2099,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>>> return ret;
>>> }
>>>  
>>> -   if (rxmode->max_rx_pkt_len > hw->max_mtu + ether_hdr_len)
>>> +   if ((rx_offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) &&
>>> +   (rxmode->max_rx_pkt_len > hw->max_mtu + ether_hdr_len))
>>> req_features &= ~(1ULL << VIRTIO_NET_F_MTU);
>>>  
>>> -   hw->max_rx_pkt_len = rxmode->max_rx_pkt_len;
>>> +   if (rx_offloads & DEV_RX_OFFLOAD_JUMBO_FRAME)
>>> +   hw->max_rx_pkt_len = rxmode->max_rx_pkt_len;
>>> +   else
>>> +   hw->max_rx_pkt_len = ether_hdr_len + dev->data->mtu;
>>>  
>>> if (rx_offloads & (DEV_RX_OFFLOAD_UDP_CKSUM |
>>>DEV_RX_OFFLOAD_TCP_CKSUM))
>>>
>>
>> Reviewed-by: Maxime Coquelin 
> 
> Maxime, please, make letters lower case in From E-mail on
> applying.
> 

Done!



Re: [dpdk-dev] [PATCH v7] vhost: fix crash on port deletion

2021-09-14 Thread Maxime Coquelin



On 9/2/21 5:45 PM, Gaoxiang Liu wrote:
> The rte_vhost_driver_unregister() and vhost_user_read_cb()
> can be called at the same time by 2 threads.
> when memory of vsocket is freed in rte_vhost_driver_unregister(),
> the invalid memory of vsocket is accessed in vhost_user_read_cb().
> It's a bug of both mode for vhost as server or client.
> 
> E.g., vhostuser port is created as server.
> Thread1 calls rte_vhost_driver_unregister().
> Before the listen fd is deleted from poll waiting fds,
> "vhost-events" thread then calls vhost_user_server_new_connection(),
> then a new conn fd is added in fdset when trying to reconnect.
> "vhost-events" thread then calls vhost_user_read_cb() and
> accesses invalid memory of socket while thread1 frees the memory of
> vsocket.
> 
> E.g., vhostuser port is created as client.
> Thread1 calls rte_vhost_driver_unregister().
> Before vsocket of reconn is deleted from reconn list,
> "vhost_reconn" thread then calls vhost_user_add_connection()
> then a new conn fd is added in fdset when trying to reconnect.
> "vhost-events" thread then calls vhost_user_read_cb() and
> accesses invalid memory of socket while thread1 frees the memory of
> vsocket.
> 
> The fix is to move the "fdset_try_del" in front of free memory of conn,
> then avoid the race condition.
> 
> The core trace is:
> Program terminated with signal 11, Segmentation fault.
> 
> Fixes: 52d874dc6705 ("vhost: fix crash on closing in client mode")
> 
> Signed-off-by: Gaoxiang Liu 
> ---
> 
> v2:
> * Fix coding style issues.
> 
> v3:
> * Add detailed log.
> 
> v4:
> * Add the reason, when vhostuser port is created as server.
> 
> v5:
> * Add detailed log when vhostuser port is created as client
> 
> v6:
> * Add 'path' check before deleting listen fd
> * Fix spelling issues
> 
> v7:
> * Fix coding style issues.
> ---
>  lib/vhost/socket.c | 107 ++---
>  1 file changed, 53 insertions(+), 54 deletions(-)
> 


Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH] vhost: promote some APIs to stable

2021-09-14 Thread Maxime Coquelin



On 9/7/21 4:58 AM, Chenbo Xia wrote:
> As reported by symbol bot, APIs listed in this patch have been
> experimental for more than two years. This patch promotes these
> 18 APIs to stable.
> 
> Signed-off-by: Chenbo Xia 
> ---
>  lib/vhost/rte_vhost.h| 13 -
>  lib/vhost/rte_vhost_crypto.h |  5 -
>  lib/vhost/version.map| 36 ++--
>  3 files changed, 18 insertions(+), 36 deletions(-)
> 

Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [dpdk-dev] [PATCH v1 1/6] build: increase default of max lcores to 512

2021-09-14 Thread David Marchand
On Tue, Sep 14, 2021 at 1:07 PM David Hunt  wrote:
> >> “ERROR: logical core 212 is above the maximum lcore number permitted.
> >> Please use the --lcores option to map lcores onto physical cores, e.g.
> >> --lcores="(0-3)@(212-215).”
> > If you could directly provide the right --lcores syntax based on what
> > user provided with -c or -l, it would be even better.
> > This should be not that difficult.
>
>
> Agreed. I now have something working that when given "-l 12-16,130,132",
> will output the following:
>
> EAL: One of the 7 cores provided exceeds RTE_MAX_LCORE (128)
> EAL: Please use --lcores instead, e.g. --lcores "(0-6)@(12-16,130,132)"

That's not equivalent.

(0-6)@(12-16,130,132) means 7 lcores with each lcore running on the
same group of physical cores.
-l 12-16,130,132 means 7 lcores running on dedicated physical cores.
I would expect 0@12,1@13,2@14,3@15,4@16,5@130,6@132


You can see with debug logs:

$ echo quit | ./build/app/dpdk-testpmd --log-level=*:debug --no-huge
-m 512 --lcores '(0-2)@(0-2)' -- --total-num-mbufs 2048 |& grep
lcore.*is.ready
EAL: Main lcore 0 is ready (tid=7feb9550bc00;cpuset=[0,1,2])
EAL: lcore 1 is ready (tid=7feb909ce700;cpuset=[0,1,2])
EAL: lcore 2 is ready (tid=7feb901cd700;cpuset=[0,1,2])

vs

$ echo quit | ./build/app/dpdk-testpmd --log-level=*:debug --no-huge
-m 512 --lcores 0@0,1@1,2@2 -- --total-num-mbufs 2048 |& grep
lcore.*is.ready
EAL: Main lcore 0 is ready (tid=7fba1cd1ac00;cpuset=[0])
EAL: lcore 2 is ready (tid=7fba179dc700;cpuset=[2])
EAL: lcore 1 is ready (tid=7fba181dd700;cpuset=[1])


-- 
David Marchand



Re: [dpdk-dev] [PATCH v2] vhost: add log print of socket path on adding connection

2021-09-14 Thread Maxime Coquelin



On 9/7/21 2:51 AM, Gaoxiang Liu wrote:
> Add log print of socket path in vhost_user_add_connection.
> It's useful when adding a mass of socket connections,
> because the information of every connection is clearer.
> 
> Fixes: 8f972312b8f4 ("vhost: support vhost-user")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Gaoxiang Liu 
> ---
>  lib/vhost/socket.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 


Applied to dpdk-next-virtio/main.

Thanks,
Maxime



[dpdk-dev] DPDK 21.11 NVIDIA Mellanox Roadmap

2021-09-14 Thread Shy Shyman
Below is NVIDIA Mellanox's roadmap for DPDK21.11, on which we are currently 
working:





ethdev new APIs:

===



[1] Introduce an optimization in memory/performance for the case of scaled-up 
interfaces.

  Motivation: An application (e.g. OVS) polls all representors 
queues. Each queue contains descriptors, and each descriptor is utilizing 
mbufs. As the number of interfaces grows (e.g. 1k Scalable Functions(SFs) ), 
the memory footprint grows dramatically (#queues X depth_of_queue X 
mbufs_memory X 1k ports), and CPU usage becomes inefficient, due to cache 
evictions between the queue contexts. The new optimization will aggregate the 
queues into a single one. It will reduce the number of entities to poll as well 
as reduce the memory footprint, allowing streamlined and efficient processing 
with much less cache evictions.



rte_flow new APIs:





[2] Extend rte_flow api to support the definition of flexible parsers.

  Motivation: NVIDIA Mellanox NICs supports flexible parser 
configuration, and we've made use of that capability within the mlx5 PMD 
before. Now we are exposing an API to allow applications to configure the NIC 
to support matching over custom/non-supported protocol. With that configuration 
done, matching can be applied to traffic using that protocol.



mlx5 PMD updates:

==

mlx5 PMD will support the rte_flow update changes listed above and below



[3]Extend mlx5 PMD capability to support up to 512 interfaces(VFs,SFs)

  Motivation: Allow applications like VDPA to utilize larger number 
of interfaces. Another example would be in the DPU in which hundreds of 
applications can be supported using SFs



rte_mempool updates:

===



[4] Improve memory registration and sharing between drivers

  Motivation: In a Data Processing Unit (DPU) environment, there's 
a need to share data between the host memory and the DPU/arm memory to 
facilitate fast data transfer of different drivers like regex and network that 
operates on the same physical device. For that, we are refactoring the memory 
registration and sharing method so that the memory region registration will be 
abstracted through that method (not left for each driver to do) which will 
enable sharing of a memory region between host and DPU/arm memory subset. 
Together with this change, wewill also optimize the huge page initialization 
and cross NUMA memory registration to speed up application start-up time.



testpmd updates:



 testpmd updated to support the changes listed above



Re: [dpdk-dev] [PATCH v3 1/3] eal/linux: make hugetlbfs analysis reusable

2021-09-14 Thread John Levon
On Tue, Sep 14, 2021 at 01:34:54PM +0300, Dmitry Kozlyuk wrote:

> get_hugepage_dir() searched for a hugetlbfs mount with a given page size
> using handcraft parsing of /proc/mounts and mixing traversal logic with
> selecting the needed entry. Separate code to enumerate hugetlbfs mounts
> to eal_hugepage_mount_walk() taking a callback that can inspect already
> parsed entries. Use mntent(3) API for parsing. This allows to reuse
> enumeration logic in subsequent patches.

Hi, are you planning to implement my pending change on top of this?

thanks
john

Re: [dpdk-dev] Questions about rte_eth_link_speed_to_str API

2021-09-14 Thread Min Hu (Connor)

Thanks Stephen,
While I think this option is more clear and simple:
+const char *
+rte_eth_link_speed_to_str(uint32_t link_speed)
+{
+#define SPEED_STRING_LEN 16
+   static char name[SPEED_STRING_LEN];
+
+   if (link_speed == ETH_SPEED_NUM_NONE)
+   return "None";
+   if (link_speed == ETH_SPEED_NUM_UNKNOWN)
+   return "Unknown";
+   if (link_speed < ETH_SPEED_NUM_1G) {
+   snprintf(name, sizeof(name), "%u Mbps", link_speed);
+   } else if (link_speed % ETH_SPEED_NUM_1G != 0){
+   snprintf(name, sizeof(name), "%.1f Gbps",
+   (double)link_speed / ETH_SPEED_NUM_1G);
+   } else {
+   snprintf(name, sizeof(name), "%u Gbps",
+   link_speed / ETH_SPEED_NUM_1G);
+   }
+
+   return (const char *)name;
+}

How about any others' opinions, thanks.

在 2021/9/14 14:59, Stephen Hemminger 写道:

On Tue, 14 Sep 2021 11:25:44 +0800
"Min Hu (Connor)"  wrote:


Thanks Thomas,
I am not sure if we need to  print combined slaves speed.
How about others' opinion ? @all

BTW, If yes, one possible option may be like that:
+const char *
+rte_eth_link_speed_to_str(uint32_t link_speed)
+{
+   char name[16];
+
+   if (link_speed == ETH_SPEED_NUM_NONE)
+   return "None";
+   if (link_speed == ETH_SPEED_NUM_NONE)
+   return "Unknown";
+   if (link_speed < ETH_SPEED_NUM_1G) {
+   snprintf(name, sizeof(name), "%u Mbps", link_speed);
+   } else {
+   snprintf(name, sizeof(name), "%u Mbps",
+   link_speed / ETH_SPEED_NUM_1G);
+   }
+
+   return name;
+}

But the float value is difficult to handle, like 2.5 Gbps for show. Any
advices?

在 2021/9/13 18:26, Thomas Monjalon 写道:

13/09/2021 10:45, Min Hu (Connor):

Hi all,
I have questions about rte_eth_link_speed_to_str API.
The API converts link speed to string for display, But it only
supports the following speeds, like that:
case ETH_SPEED_NUM_NONE: return "None";
case ETH_SPEED_NUM_10M:  return "10 Mbps";
case ETH_SPEED_NUM_100M: return "100 Mbps";
case ETH_SPEED_NUM_1G:   return "1 Gbps";
case ETH_SPEED_NUM_2_5G: return "2.5 Gbps";
case ETH_SPEED_NUM_5G:   return "5 Gbps";
case ETH_SPEED_NUM_10G:  return "10 Gbps";
case ETH_SPEED_NUM_20G:  return "20 Gbps";
case ETH_SPEED_NUM_25G:  return "25 Gbps";
case ETH_SPEED_NUM_40G:  return "40 Gbps";
case ETH_SPEED_NUM_50G:  return "50 Gbps";
case ETH_SPEED_NUM_56G:  return "56 Gbps";
case ETH_SPEED_NUM_100G: return "100 Gbps";
case ETH_SPEED_NUM_200G: return "200 Gbps";
case ETH_SPEED_NUM_UNKNOWN: return "Unknown";
default: return "Invalid";

In some cases, like bonding, for example, three slaves which
link speed are 10Gbps, so link speed of bonding port will be
30Gbps, but it shows "Invalid".

Is this reasonable? any comments will be welcome.


Is it meaningful to print combined slaves speed?
If yes, we can do better then this fixed switch/case logic,
it shouldn't be too hard given it is a standard uint32_t value.


.
   


Since all the values are encoded numerically do some math.
This is what iproute2 has evolved to doing..


int print_color_rate(bool use_iec, enum output_type type, enum color_attr color,
  const char *key, const char *fmt, unsigned long long rate)
{
 unsigned long kilo = use_iec ? 1024 : 1000;
 const char *str = use_iec ? "i" : "";
 static char *units[5] = {"", "K", "M", "G", "T"};
 char *buf;
 int rc;
 int i;

 if (_IS_JSON_CONTEXT(type))
 return print_color_lluint(type, color, key, "%llu", rate);

 rate <<= 3; /* bytes/sec -> bits/sec */

 for (i = 0; i < ARRAY_SIZE(units) - 1; i++)  {
 if (rate < kilo)
 break;
 if (((rate % kilo) != 0) && rate < 1000*kilo)
 break;
 rate /= kilo;
 }

 rc = asprintf(&buf, "%.0f%s%sbit", (double)rate, units[i],
   i > 0 ? str : "");
 if (rc < 0)
 return -1;

 rc = print_color_string(type, color, key, fmt, buf);
 free(buf);
}
 


.



[dpdk-dev] [PATCH v2] examples/ipsec-secgw: add support for event vector

2021-09-14 Thread Srujana Challa
Adds event vector support to inline protocol offload mode.
By default vector support is disabled, it can be enabled by
using the option --event-vector.
Additional options to configure vector size and vector timeout are
also implemented and can be used by specifying --vector-size and
--vector-tmo.

Signed-off-by: Srujana Challa 
---
Depends-on: series-18262 ("security: Improve inline fast path routines")
Depends-on: series-18322 ("eventdev: simplify Rx adapter event vector
config")

v2:
* Set rte_event_vector::attr_valid if all packets in the vector uses
same port.

 doc/guides/sample_app_ug/ipsec_secgw.rst |  18 +-
 examples/ipsec-secgw/event_helper.c  |  78 -
 examples/ipsec-secgw/event_helper.h  |   8 +
 examples/ipsec-secgw/ipsec-secgw.c   |  41 ++-
 examples/ipsec-secgw/ipsec-secgw.h   |   2 +
 examples/ipsec-secgw/ipsec_worker.c  | 350 ++-
 6 files changed, 492 insertions(+), 5 deletions(-)

diff --git a/doc/guides/sample_app_ug/ipsec_secgw.rst 
b/doc/guides/sample_app_ug/ipsec_secgw.rst
index 78171b25f9..557ca510f7 100644
--- a/doc/guides/sample_app_ug/ipsec_secgw.rst
+++ b/doc/guides/sample_app_ug/ipsec_secgw.rst
@@ -86,6 +86,15 @@ The application supports two modes of operation: poll mode 
and event mode.
   threads and supports inline protocol only.** It also provides infrastructure 
for
   non-internal port however does not define any worker threads.
 
+  Event mode also supports event vectorization. The event devices, ethernet 
device
+  pairs which support the capability 
``RTE_EVENT_ETH_RX_ADAPTER_CAP_EVENT_VECTOR`` can
+  aggregate packets based on flow characteristics and generate a ``rte_event``
+  containing ``rte_event_vector``.
+  The aggregation size and timeout can be given using command line options 
vector-size
+  (default vector-size is 16) and vector-tmo (default vector-tmo is 102400ns).
+  By default event vectorization is disabled and it can be enabled using 
event-vector
+  option.
+
 Additionally the event mode introduces two submodes of processing packets:
 
 * Driver submode: This submode has bare minimum changes in the application to 
support
@@ -293,7 +302,8 @@ event app mode::
 
 .//examples/dpdk-ipsec-secgw -c 0x3 -- -P -p 0x3 -u 0x1   \
-f /path/to/config_file --transfer-mode event \
-   --event-schedule-type parallel\
+   --event-schedule-type parallel --event-vector --vector-size 32\
+   --vector-tmo 102400   \
 
 where each option means:
 
@@ -312,6 +322,12 @@ where each option means:
 
 *   The ``--event-schedule-type`` option selects parallel ordering of event 
queues.
 
+*   The ``--event-vector`` option enables event vectorization.
+
+*   The ``--vector-size`` option specifies max vector size.
+
+*   The ``--vector-tmo`` option specifies max timeout in nanoseconds for 
vectorization.
+
 
 Refer to the *DPDK Getting Started Guide* for general information on running
 applications and the Environment Abstraction Layer (EAL) options.
diff --git a/examples/ipsec-secgw/event_helper.c 
b/examples/ipsec-secgw/event_helper.c
index 8475d542b2..e8600f5e90 100644
--- a/examples/ipsec-secgw/event_helper.c
+++ b/examples/ipsec-secgw/event_helper.c
@@ -10,6 +10,10 @@
 #include 
 
 #include "event_helper.h"
+#include "ipsec-secgw.h"
+
+#define DEFAULT_VECTOR_SIZE  16
+#define DEFAULT_VECTOR_TMO   102400
 
 static volatile bool eth_core_running;
 
@@ -728,6 +732,45 @@ eh_initialize_eventdev(struct eventmode_conf *em_conf)
return 0;
 }
 
+static int
+eh_event_vector_limits_validate(struct eventmode_conf *em_conf,
+   uint8_t ev_dev_id, uint8_t ethdev_id)
+{
+   struct rte_event_eth_rx_adapter_vector_limits limits = {0};
+   uint16_t vector_size = em_conf->ext_params.vector_size;
+   int ret;
+
+   ret = rte_event_eth_rx_adapter_vector_limits_get(ev_dev_id, ethdev_id,
+&limits);
+   if (ret) {
+   EH_LOG_ERR("failed to get vector limits");
+   return ret;
+   }
+
+   if (vector_size < limits.min_sz || vector_size > limits.max_sz) {
+   EH_LOG_ERR("Vector size [%d] not within limits min[%d] max[%d]",
+  vector_size, limits.min_sz, limits.max_sz);
+   return -EINVAL;
+   }
+
+   if (limits.log2_sz && !rte_is_power_of_2(vector_size)) {
+   EH_LOG_ERR("Vector size [%d] not power of 2", vector_size);
+   return -EINVAL;
+   }
+
+   if (em_conf->vector_tmo_ns > limits.max_timeout_ns ||
+   em_conf->vector_tmo_ns < limits.min_timeout_ns) {
+   EH_LOG_ERR("Vector timeout [%" PRIu64
+  "] not within limits max[%" PRIu64
+  "] min[%" PRIu64 "]",
+  em_conf->vector_tmo_ns,
+  limits.max_timeout_ns,
+   

Re: [dpdk-dev] [RFC 0/7] hide eth dev related structures

2021-09-14 Thread Ananyev, Konstantin

Hi Jerin,

> > NOTE: This is just an RFC to start further discussion and collect the 
> > feedback.
> > Due to significant amount of work, changes required are applied only to two
> > PMDs so far: net/i40e and net/ice.
> > So to build it you'll need to add:
> > -Denable_drivers='common/*,mempool/*,net/ice,net/i40e'
> > to your config options.
> 
> >
> > That approach was selected to avoid(/minimize) possible performance losses.
> >
> > So far I done only limited amount functional and performance testing.
> > Didn't spot any functional problems, and performance numbers
> > remains the same before and after the patch on my box (testpmd, macswap 
> > fwd).
> 
> 
> Based on testing on octeonxt2. We see some regression in testpmd and
> bit on l3fwd too.
> 
> Without patch: 73.5mpps/core in testpmd iofwd
> With out patch: 72 5mpps/core in testpmd iofwd
> 
> Based on my understanding it is due to additional indirection.

From your patch below, it looks like not actually additional indirection,
but extra memory dereference - func and dev pointers are now stored
at different places. Plus the fact that now we dereference rte_eth_devices[]
data inside PMD function. Which probably prevents compiler and CPU to load
 rte_eth_devices[port_id].data and rte_eth_devices[port_id]. 
pre_tx_burst_cbs[queue_id]  
in advance before calling actual RX/TX function.
About your approach: I don’t mind to add extra opaque 'void *data' pointer,
but would prefer not to expose callback invocations code into inline function.
Main reason for that - I think it still need to be reworked to allow 
adding/removing 
callbacks without stopping the device. Something similar to what was done for 
cryptodev
callbacks. To be able to do that in future without another ABI breakage 
callbacks related part
needs to be kept internal.
Though what we probably can do: add two dynamic arrays of opaque pointers to  
rte_eth_burst_api.
One for rx/tx queue data pointers, second for rx/tx callback pointers.
To be more specific, something like:

typedef uint16_t (*rte_eth_rx_burst_t)( void *rxq, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts, void *cbs);
typedef uint16_t (*rte_eth_tx_burst_t)(void *txq, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts, void *cbs);


struct rte_eth_burst_api {
rte_eth_rx_burst_t rx_pkt_burst;
/**< PMD receive function. */
rte_eth_tx_burst_t tx_pkt_burst;
/**< PMD transmit function. */
rte_eth_tx_prep_t tx_pkt_prepare;
/**< PMD transmit prepare function. */
rte_eth_rx_queue_count_t rx_queue_count;
/**< Get the number of used RX descriptors. */
rte_eth_rx_descriptor_status_t rx_descriptor_status;
/**< Check the status of a Rx descriptor. */
rte_eth_tx_descriptor_status_t tx_descriptor_status;
/**< Check the status of a Tx descriptor. */
struct {
 void **queue_data;   /* point to 
rte_eth_devices[port_id].data-> rx_queues */
 void **cbs;  /*  points to 
rte_eth_devices[port_id].post_rx_burst_cbs */ 
   } rx_data, tx_data;
} __rte_cache_aligned;

static inline uint16_t
rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
{
   struct rte_eth_burst_api *p;

if (port_id >= RTE_MAX_ETHPORTS || queue_id >= RTE_MAX_QUEUES_PER_PORT)
return 0;
 
  p =  &rte_eth_burst_api[port_id];
  return p->rx_pkt_burst(p->rx_data.queue_data[queue_id], rx_pkts, nb_pkts, 
p->rx_data.cbs[queue_id]);
}

Same for TX.

If that looks ok to everyone, I'll try to prepare next version based on that.
In theory that should avoid extra dereference problem and even reduce 
indirection.
As a drawback data->rxq/txq should always be allocated for 
RTE_MAX_QUEUES_PER_PORT entries,
but I presume that’s not a big deal.

As a side question - is there any reason why rte_ethdev_trace_rx_burst() is 
invoked at very last point,
while rte_ethdev_trace_tx_burst()  after CBs but before actual tx_pkt_burst()?
It would make things simpler if tracng would always be done either on entrance 
or exit of rx/tx_burst.

> 
> My suggestion to fix the problem by:
> Removing the additional `data` redirection and pull callback function
> pointers back
> and keep rest as opaque as done in the existing patch like [1]
> 
> I don't believe this has any real implication on future ABI stability
> as we will not be adding
> any new item in rte_eth_fp in any way as new features can be added in slowpath
> rte_eth_dev as mentioned in the patch.
> 
> [2] is the patch of doing the same as I don't see any performance
> regression after [2].
> 
> 
> [1]
> - struct rte_eth_burst_api {
> - struct rte_eth_fp {
> + void *data;
>   rte_eth_rx_burst_t rx_pkt_burst;
>   /**< PMD receive function. */
>   rte_eth_tx_burst_t tx_pkt_burst;
> @@ -85,8 +100,19 @@ struct rte_eth_burst_api {
>   /**< Check the status of a Rx descriptor. */
>   rte_eth_tx_descriptor_statu

Re: [dpdk-dev] [PATCH v3] net/pcap: support buffer size parameter

2021-09-14 Thread Ferruh Yigit
On 8/28/2021 10:47 AM, Qiming Chen wrote:
> When the pcap port is probed, the size of the pcap message buffer is not
> set, the default is 2M, and then this value has a great impact on the
> message forwarding performance. Therefore, parameters are provided for
> users to set.
> 

Hi Qiming,

I assume you suggest buffer should be set bigger than 2M for better performance.
I can see following description for "pcap message buffer" [1], I am not clear
why this impacts the performance, can you please clarify?
If the producer rate is higher than consumer rate, performance would be same but
bigger buffer only delays the packet drops. It may only help on the case
producer has peaks, but still not sure why it increase the performance.
I did quick checks and not observed any performance improvement, can you please
detail your usecase?


Another concern is below description mentions "On some platforms, the buffer's
size can be set". Pcap PMD now supports Windows too (cc'ed Dmitry), I wonder if
this features is supported on Windows?


[1]
buffer size
Packets that arrive for a capture are stored in a buffer, so that they do not
have to be read by the application as soon as they arrive. On some platforms,
the buffer's size can be set; a size that's too small could mean that, if too
many packets are being captured and the snapshot length doesn't limit the amount
of data that's buffered, packets could be dropped if the buffer fills up before
the application can read packets from it, while a size that's too large could
use more non-pageable operating system memory than is necessary to prevent
packets from being dropped.
The buffer size is set with pcap_set_buffer_size().


> In order to pass the buffer size parameter parsed by the probe to the
> start function, the buf_size member variable is added to the
> struct pmd_process_private structure. At the same time, for the uniform
> code style, the buf_size member variable is also added to the
> struct pmd_devargs structure, which is used by the probe function.
> 

Why added to process_private data, but not to 'struct pmd_internals'. Process
private data is for the variables that will be different for primary and
secondary process.

> Signed-off-by: Qiming Chen 
> ---
> v2:
>   Clear coding style warning.
> v3:
>   When buf_size=0, the modification keeps the old implementation unchanged.
> ---
>  drivers/net/pcap/pcap_ethdev.c | 78 +-
>  1 file changed, 68 insertions(+), 10 deletions(-)

Documentation also needs to be updated: 'doc/guides/nics/pcap_ring.rst'

> 
> diff --git a/drivers/net/pcap/pcap_ethdev.c b/drivers/net/pcap/pcap_ethdev.c
> index a8774b7a43..fdc74313d5 100644
> --- a/drivers/net/pcap/pcap_ethdev.c
> +++ b/drivers/net/pcap/pcap_ethdev.c
> @@ -33,6 +33,7 @@
>  #define ETH_PCAP_IFACE_ARG"iface"
>  #define ETH_PCAP_PHY_MAC_ARG  "phy_mac"
>  #define ETH_PCAP_INFINITE_RX_ARG  "infinite_rx"
> +#define ETH_PCAP_BUF_SIZE_ARG "buf_size"
>  
>  #define ETH_PCAP_ARG_MAXLEN  64
>  
> @@ -98,6 +99,7 @@ struct pmd_process_private {
>   pcap_t *rx_pcap[RTE_PMD_PCAP_MAX_QUEUES];
>   pcap_t *tx_pcap[RTE_PMD_PCAP_MAX_QUEUES];
>   pcap_dumper_t *tx_dumper[RTE_PMD_PCAP_MAX_QUEUES];
> + int buf_size;
>  };
>  
>  struct pmd_devargs {
> @@ -109,6 +111,7 @@ struct pmd_devargs {
>   const char *type;
>   } queue[RTE_PMD_PCAP_MAX_QUEUES];
>   int phy_mac;
> + int buf_size;
>  };
>  
>  struct pmd_devargs_all {
> @@ -131,6 +134,7 @@ static const char *valid_arguments[] = {
>   ETH_PCAP_IFACE_ARG,
>   ETH_PCAP_PHY_MAC_ARG,
>   ETH_PCAP_INFINITE_RX_ARG,
> + ETH_PCAP_BUF_SIZE_ARG,
>   NULL
>  };
>  
> @@ -521,11 +525,46 @@ open_iface_live(const char *iface, pcap_t **pcap) {
>  }
>  
>  static int
> -open_single_iface(const char *iface, pcap_t **pcap)
> +open_single_iface(const char *iface, int buf_size, pcap_t **pcap)
>  {
> - if (open_iface_live(iface, pcap) < 0) {
> - PMD_LOG(ERR, "Couldn't open interface %s", iface);
> - return -1;
> + if (buf_size == 0) {
> + if (open_iface_live(iface, pcap) < 0) {
> + PMD_LOG(ERR, "Couldn't open interface %s", iface);
> + return -1;
> + }
> + } else {
> + pcap_t *p = pcap_create(iface, errbuf);
> + if (p == NULL) {
> + PMD_LOG(ERR, "Couldn't create %s pcap", iface);
> + return -1;
> + }
> +
> + if (pcap_set_snaplen(p, RTE_ETH_PCAP_SNAPLEN) < 0) {
> + PMD_LOG(ERR, "Couldn't set %s pcap snaplen", iface);
> + return -1;
> + }
> +
> + if (pcap_set_promisc(p, RTE_ETH_PCAP_PROMISC) < 0) {
> + PMD_LOG(ERR, "Couldn't set %s pcap promisc", iface);
> + return -1;
> + }
> +
> + if (pcap_set_timeout(p, RTE_ETH_PCAP_TIMEOUT) < 0) {
> + P

Re: [dpdk-dev] [PATCH] build: propagate Windows system dependencies to pkg-config

2021-09-14 Thread Thomas Monjalon
20/08/2021 18:08, William Tu:
> On Thu, Aug 19, 2021 at 4:15 PM Dmitry Kozlyuk  
> wrote:
> >
> > Windows EAL depends on some system libraries. They were linked using
> > add_project_link_arguments('-l'), which prevented meson from adding
> > them to Libs.private of pkg-config file. As a result, applications using
> > pkg-config to find DPDK hit link errors, for example:
> >
> > librte_eal.a(eal_windows_eal_debug.c.obj) : error LNK2019: unresolved
> > external symbol __imp_SymInitialize referenced in function
> > rte_dump_stack
> >
> > Reference required libraries in EAL using ext_deps meson variable.
> > bus/pci and net/pcap depend on lib/eal and will pull them automatically.
> > Drop advapi32 dependency, as MinGW locates VirtualAlloc2() dynamically.
> >
> > Fixes: 2a5d547a4a9b ("eal/windows: implement basic memory management")
> > Fixes: c91717eb75c8 ("eal/windows: support exit and panic")
> > Cc: tal...@nvidia.com
> > Cc: sta...@dpdk.org
> >
> > Reported-by: William Tu 
> > Signed-off-by: Dmitry Kozlyuk 
> > ---
> 
> Thanks for the fix.
> I've tested on my Windows environment and it works ok.
> the libdpdk.pc shows the required libraries.
> 
> Acked-by: William Tu 

Converted to Tested-by.

Applied, thanks




Re: [dpdk-dev] [PATCH v7] ethdev: add IPv4 and L4 checksum RSS offload types

2021-09-14 Thread Ferruh Yigit
On 8/31/2021 10:52 AM, Alvin Zhang wrote:
> This patch defines new RSS offload types for IPv4 and
> L4(TCP/UDP/SCTP) checksum, which are required when users want
> to distribute packets based on the IPv4 or L4 checksum field.
> 
> For example "flow create 0 ingress pattern eth / ipv4 / end
> actions rss types ipv4-chksum end queues end / end", this flow
> causes all matching packets to be distributed to queues on
> basis of IPv4 checksum.
> 
> Signed-off-by: Alvin Zhang 
> Acked-by: Ajit Khaparde 
> Acked-by: Aman Deep Singh 
> ---
> 
> v6: rebase to eeedef70, update some note
> v7: fix code style issues
> ---
>  app/test-pmd/cmdline.c |  4 +++-
>  app/test-pmd/config.c  |  2 ++
>  doc/guides/rel_notes/release_21_11.rst |  5 +
>  lib/ethdev/rte_ethdev.h| 24 
>  4 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 82253bc..656a311 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -2252,6 +2252,8 @@ struct cmd_config_rss {
>   rss_conf.rss_hf = ETH_RSS_ECPRI;
>   else if (!strcmp(res->value, "mpls"))
>   rss_conf.rss_hf = ETH_RSS_MPLS;
> + else if (!strcmp(res->value, "ipv4-chksum"))
> + rss_conf.rss_hf = ETH_RSS_IPV4_CHKSUM;
>   else if (!strcmp(res->value, "none"))
>   rss_conf.rss_hf = 0;
>   else if (!strcmp(res->value, "level-default")) {
> @@ -2323,7 +2325,7 @@ struct cmd_config_rss {
>   .help_str = "port config all rss "
>   "all|default|eth|vlan|ip|tcp|udp|sctp|ether|port|vxlan|geneve|"
>   
> "nvgre|vxlan-gpe|l2tpv3|esp|ah|pfcp|ecpri|mpls|none|level-default|"
> - "level-outer|level-inner|",
> + "level-outer|level-inner|ipv4-chksum|",
>   .tokens = {
>   (void *)&cmd_config_rss_port,
>   (void *)&cmd_config_rss_keyword,
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 31d8ba1..ece78f2 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -140,6 +140,8 @@
>   { "gtpu", ETH_RSS_GTPU },
>   { "ecpri", ETH_RSS_ECPRI },
>   { "mpls", ETH_RSS_MPLS },
> + { "ipv4-chksum", ETH_RSS_IPV4_CHKSUM },
> + { "l4-chksum", ETH_RSS_L4_CHKSUM },>{ NULL, 0 },
>  };
>  
> diff --git a/doc/guides/rel_notes/release_21_11.rst 
> b/doc/guides/rel_notes/release_21_11.rst
> index d707a55..fa29b13 100644
> --- a/doc/guides/rel_notes/release_21_11.rst
> +++ b/doc/guides/rel_notes/release_21_11.rst
> @@ -55,6 +55,11 @@ New Features
>   Also, make sure to start the actual text at the margin.
>   ===
>  
> +* **Add new RSS offload types for IPv4/L4 checksum in RSS flow.**
> +
> +  Add macros ETH_RSS_IPV4_CHKSUM and ETH_RSS_L4_CHKSUM, now IPv4 and
> +  TCP/UDP/SCTP header checksum field can be used as input set for RSS.
> +
>  
>  Removed Items
>  -
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index d2b27c3..e6734df 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -537,6 +537,30 @@ struct rte_eth_rss_conf {
>  #define ETH_RSS_PPPOE   (1ULL << 31)
>  #define ETH_RSS_ECPRI   (1ULL << 32)
>  #define ETH_RSS_MPLS(1ULL << 33)
> +#define ETH_RSS_IPV4_CHKSUM (1ULL << 34)
> +
> +/**
> + * The ETH_RSS_L4_CHKSUM generally refers to a type of checksum field for

what does 'generally' means here? Is there a case it refers to something else?

> + * any L4 header, such as TCP, UDP and SCTP. It is similar to ETH_RSS_PORT,
> + * it does not specify the type of L4 header.
> + * We use this macro to replace below macro for constricting the use of RSS
> + * offload bits:
> + * ETH_RSS_IPV4_TCP_CHKSUM
> + * ETH_RSS_IPV4_UDP_CHKSUM
> + * ETH_RSS_IPV4_SCTP_CHKSUM
> + * ETH_RSS_IPV6_TCP_CHKSUM
> + * ETH_RSS_IPV6_UDP_CHKSUM
> + * ETH_RSS_IPV6_SCTP_CHKSUM

As I get you are listing them here to say the 'ETH_RSS_L4_CHKSUM' replaces
possible usage of above list, but my concern is it may confuse people as those
macros exists (or did exist in the past), so what do you think to remove them?


And just to confirm, we can't use this flag, 'ETH_RSS_L4_CHKSUM' anymore with
'rte_eth_rss_conf.rss_hf', right? Since it will be missing some context for it.
Which means some old APIs (and configuration) won't support this new offload,
but only rte_flow will support it.
If above is correct should it be highlighted in above comment?

> + *
> + * Then how to use this macro? We can use it in RSS flow where the pattern

Can we convert this question to a description just to be a little more formal?

> + * type will specify the L4 header type, for example "flow create 0 ingress \
> + * pattern eth / ipv4 / tcp / end actions rss types l4-chksum  end queues 
> end \
> + * / end"
> + *
> + * For the case that checksum is not used in a UDP header, i

Re: [dpdk-dev] [RFC 2/7] eth: make drivers to use new API for Rx

2021-09-14 Thread Ananyev, Konstantin


Hi Ferruh,

> 
> Overall this enables us hiding the ethdev internals, which is good. But it
> duplicates most of the datapath function (rx burst for this patch) per each 
> PMD ops.

Yes, same as right now rte_eth_rx/tx_burst() code can be duplicated in dozen 
places 
inside user-level code. And as any other 'static inline' function that we 
define and use inside DPDK.
Personally I don't see why it is a problem.

> 
> I wonder if we can have the callbacks ('_rte_eth_rx_epilog()') as separate
> function, this still enables us to hide the structs. Of course additional
> function call will bring some overhead, but if we enabled callbacks and 
> calling
> them per packet, do we really care about additional function call?

Callbacks are not per packet, but per burst of packets - same as actual RX/TX.
A drawback with such approach -  we either have to keep
post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT] visible to the user
(which I'd prefer not to), or call epilolg() unconditionally - which means
performance drop. 
 
> 
> > Signed-off-by: Konstantin Ananyev 
> 
> <...>
> 
> > @@ -3229,7 +3289,7 @@ int
> >  ice_rx_burst_mode_get(struct rte_eth_dev *dev, __rte_unused uint16_t 
> > queue_id,
> >   struct rte_eth_burst_mode *mode)
> >  {
> > -   eth_rx_burst_t pkt_burst = dev->rx_pkt_burst;
> > +   rte_eth_rx_burst_t pkt_burst = rte_eth_get_rx_burst(dev->data->port_id);
> 
> Does it makes easier to orginanise the patchset to have a separate patch to
> switch first to 'rte_eth_get_rx_burst()' / 'rte_eth_set_rx_burst()' with old
> implementation ('dev->rx_pkt_burst' get/set), and later just change the
> 'rte_eth_get_rx_burst()' / 'rte_eth_set_rx_burst()' implementation when
> structure is updated.

This is doable, don't know would be there any benefit from that or not. 

> 
> <...>
> 
> > --- a/drivers/net/ice/ice_rxtx_vec_sse.c
> > +++ b/drivers/net/ice/ice_rxtx_vec_sse.c
> > @@ -587,13 +587,15 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, 
> > struct rte_mbuf **rx_pkts,
> >   * - nb_pkts > ICE_VPMD_RX_BURST, only scan ICE_VPMD_RX_BURST
> >   *   numbers of DD bits
> >   */
> > -uint16_t
> > +static inline uint16_t
> 
> These functions eventually will be called via a function pointer, so is there 
> a
> benefit to request them to 'inline', why not just 'static' ?

Agree.
 
> <...>
> 
> > +_RTE_ETH_RX_DEF(ice_recv_scattered_pkts_vec)
> > +
> 
> This will duplicate most of the Rx burst function for each PMD Rx ops.
> 
> <...>
> 
> > +
> > +#define _RTE_ETH_FUNC(fn)  _rte_eth_##fn
> > +
> 
> Do we need this macro? The functions are still 'static', so they won't be
> visible to application and there won't be a namespace problem.

Not all RX/TX burst functions are defined as 'static'.
 
> Dropping and just use the original fucntion name may reduce the changes in the
> drivers.

It allows to keep existing RX/TX functions intact - no need to change 
prototype, add prolog/epilog, etc. manually.
Instead these macros help to create a wrapper functions around existing ones, 
that will
become new public entry points. 
All that should help to make changes faster and in a safer manner.
Though these macros are just helper ones to simplify the transition.
if someone will prefer to make changes in all their RX/TX function by hand - 
that is still possible. 
 
> <...>
> 
> > +__rte_experimental
> > +rte_eth_rx_burst_t rte_eth_get_rx_burst(uint16_t port_id);
> > +
> > +__rte_experimental
> > +int rte_eth_set_rx_burst(uint16_t port_id, rte_eth_rx_burst_t rxf);
> 
> can s/__rte_experimental/__rte_internal/

OK.
 
> <...>
> 
> > +
> > +__rte_experimental
> > +rte_eth_rx_burst_t
> > +rte_eth_get_rx_burst(uint16_t port_id)
> > +{
> > +   if (port_id >= RTE_DIM(rte_eth_burst_api)) {
> > +   rte_errno = EINVAL;
> > +   return NULL;
> > +   }
> > +   return rte_eth_burst_api[port_id].rx_pkt_burst;
> > +}
> > +
> > +__rte_experimental
> > +int
> > +rte_eth_set_rx_burst(uint16_t port_id, rte_eth_rx_burst_t rxf)
> > +{
> > +   if (port_id >= RTE_DIM(rte_eth_burst_api))
> > +   return -EINVAL;
> > +
> > +   rte_eth_burst_api[port_id].rx_pkt_burst = rxf;
> > +   return 0;
> > +}
> 
> Since these are internal functions for drivers, it can be easier for drivers 
> to
> use directly with 'struct rte_eth_dev *eth_dev', instead of 'port_id'.
> 
> So instead of APIs getting 'port_id' as parameter, they can get 'struct
> rte_eth_dev *eth_dev'? Drivers for sure will have 'eth_dev' references for 
> their
> device.

I am fine either way - it is a control path internal function.
 
> Overall, I think make sense for all public APIs to have handler ('port_id') as
> parameter, and all driver APIs to have 'eth_device' as paramter.
> 
> <...>
> 
> > @@ -4981,44 +4981,11 @@ static inline uint16_t
> >  rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
> >  struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
> >  {
> > -   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > -   uint16_t nb_rx;
> > -
> > -

[dpdk-dev] [PATCH v3 0/3] eal: add memory pre-allocation from existing files

2021-09-14 Thread Dmitry Kozlyuk
Hugepage allocation from the system takes time, resulting in slow
startup or sporadic delays later. Most of the time spent in kernel
is zero-filling memory for security reasons, which may be irrelevant
in a controlled environment. The bottleneck is memory access speed,
so for speeduup the amount of memory cleared must be reduced.
We propose a new EAL option --mem-file FILE1,FILE2,... to quickly
allocate dirty pages from existing files and clean it as necessary.
A new malloc_perf_autotest is provided to estimate the impact.
More details are explained in relevant patches.

v3: fix hugepage mount point detection
v2: fix CI failures

Dmitry Kozlyuk (2):
  eal/linux: make hugetlbfs analysis reusable
  app/test: add allocator performance autotest

Viacheslav Ovsiienko (1):
  eal: add memory pre-allocation from existing files

 app/test/meson.build  |   2 +
 app/test/test_malloc_perf.c   | 157 +
 doc/guides/linux_gsg/linux_eal_parameters.rst |  17 +
 lib/eal/common/eal_common_dynmem.c|   6 +
 lib/eal/common/eal_common_options.c   |  23 ++
 lib/eal/common/eal_internal_cfg.h |   4 +
 lib/eal/common/eal_memalloc.h |   8 +-
 lib/eal/common/eal_options.h  |   2 +
 lib/eal/common/malloc_elem.c  |   5 +
 lib/eal/common/malloc_heap.h  |   8 +
 lib/eal/common/rte_malloc.c   |  16 +-
 lib/eal/include/rte_memory.h  |   4 +-
 lib/eal/linux/eal.c   |  28 ++
 lib/eal/linux/eal_hugepage_info.c | 158 ++---
 lib/eal/linux/eal_hugepage_info.h |  39 +++
 lib/eal/linux/eal_memalloc.c  | 328 +-
 16 files changed, 735 insertions(+), 70 deletions(-)
 create mode 100644 app/test/test_malloc_perf.c
 create mode 100644 lib/eal/linux/eal_hugepage_info.h

-- 
2.25.1



[dpdk-dev] [PATCH v3 1/3] eal/linux: make hugetlbfs analysis reusable

2021-09-14 Thread Dmitry Kozlyuk
get_hugepage_dir() searched for a hugetlbfs mount with a given page size
using handcraft parsing of /proc/mounts and mixing traversal logic with
selecting the needed entry. Separate code to enumerate hugetlbfs mounts
to eal_hugepage_mount_walk() taking a callback that can inspect already
parsed entries. Use mntent(3) API for parsing. This allows to reuse
enumeration logic in subsequent patches.

Signed-off-by: Dmitry Kozlyuk 
Reviewed-by: Viacheslav Ovsiienko 
---
Cc: John Levon 

 lib/eal/linux/eal_hugepage_info.c | 153 +++---
 lib/eal/linux/eal_hugepage_info.h |  39 
 2 files changed, 135 insertions(+), 57 deletions(-)
 create mode 100644 lib/eal/linux/eal_hugepage_info.h

diff --git a/lib/eal/linux/eal_hugepage_info.c 
b/lib/eal/linux/eal_hugepage_info.c
index d97792cade..726a086ab3 100644
--- a/lib/eal/linux/eal_hugepage_info.c
+++ b/lib/eal/linux/eal_hugepage_info.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -34,6 +35,7 @@
 #include "eal_private.h"
 #include "eal_internal_cfg.h"
 #include "eal_hugepages.h"
+#include "eal_hugepage_info.h"
 #include "eal_filesystem.h"
 
 static const char sys_dir_path[] = "/sys/kernel/mm/hugepages";
@@ -195,73 +197,110 @@ get_default_hp_size(void)
return size;
 }
 
-static int
-get_hugepage_dir(uint64_t hugepage_sz, char *hugedir, int len)
+int
+eal_hugepage_mount_walk(eal_hugepage_mount_walk_cb *cb, void *cb_arg)
 {
-   enum proc_mount_fieldnames {
-   DEVICE = 0,
-   MOUNTPT,
-   FSTYPE,
-   OPTIONS,
-   _FIELDNAME_MAX
-   };
-   static uint64_t default_size = 0;
-   const char proc_mounts[] = "/proc/mounts";
-   const char hugetlbfs_str[] = "hugetlbfs";
-   const size_t htlbfs_str_len = sizeof(hugetlbfs_str) - 1;
-   const char pagesize_opt[] = "pagesize=";
-   const size_t pagesize_opt_len = sizeof(pagesize_opt) - 1;
-   const char split_tok = ' ';
-   char *splitstr[_FIELDNAME_MAX];
-   char buf[BUFSIZ];
-   int retval = -1;
-   const struct internal_config *internal_conf =
-   eal_get_internal_configuration();
-
-   FILE *fd = fopen(proc_mounts, "r");
-   if (fd == NULL)
-   rte_panic("Cannot open %s\n", proc_mounts);
+   static const char PATH[] = "/proc/mounts";
+   static const char OPTION[] = "pagesize";
+
+   static uint64_t default_size;
+
+   FILE *f = NULL;
+   struct mntent *m;
+   char *hugepage_sz_str;
+   uint64_t hugepage_sz;
+   int ret = -1;
+
+   f = setmntent(PATH, "r");
+   if (f == NULL) {
+   RTE_LOG(ERR, EAL, "%s(): setmntent(%s): %s\n",
+   __func__, PATH, strerror(errno));
+   goto exit;
+   }
 
if (default_size == 0)
default_size = get_default_hp_size();
 
-   while (fgets(buf, sizeof(buf), fd)){
-   if (rte_strsplit(buf, sizeof(buf), splitstr, _FIELDNAME_MAX,
-   split_tok) != _FIELDNAME_MAX) {
-   RTE_LOG(ERR, EAL, "Error parsing %s\n", proc_mounts);
-   break; /* return NULL */
-   }
+   ret = 0;
+   do {
+   m = getmntent(f);
+   if (m == NULL)
+   break;
 
-   /* we have a specified --huge-dir option, only examine that dir 
*/
-   if (internal_conf->hugepage_dir != NULL &&
-   strcmp(splitstr[MOUNTPT], 
internal_conf->hugepage_dir) != 0)
+   if (strcmp(m->mnt_type, "hugetlbfs") != 0)
continue;
 
-   if (strncmp(splitstr[FSTYPE], hugetlbfs_str, htlbfs_str_len) == 
0){
-   const char *pagesz_str = strstr(splitstr[OPTIONS], 
pagesize_opt);
-
-   /* if no explicit page size, the default page size is 
compared */
-   if (pagesz_str == NULL){
-   if (hugepage_sz == default_size){
-   strlcpy(hugedir, splitstr[MOUNTPT], 
len);
-   retval = 0;
-   break;
-   }
-   }
-   /* there is an explicit page size, so check it */
-   else {
-   uint64_t pagesz = 
rte_str_to_size(&pagesz_str[pagesize_opt_len]);
-   if (pagesz == hugepage_sz) {
-   strlcpy(hugedir, splitstr[MOUNTPT], 
len);
-   retval = 0;
-   break;
-   }
+   hugepage_sz_str = hasmntopt(m, OPTION);
+   if (hugepage_sz_str != NULL) {
+   hugepage_sz_str += strlen(OPTION) + 1; /* +1 fo

[dpdk-dev] [PATCH v3 2/3] eal: add memory pre-allocation from existing files

2021-09-14 Thread Dmitry Kozlyuk
From: Viacheslav Ovsiienko 

The primary DPDK process launch might take a long time if initially
allocated memory is large. From practice allocation of 1 TB of memory
over 1 GB hugepages on Linux takes tens of seconds. Fast restart
is highly desired for some applications and launch delay presents
a problem.

The primary delay happens in this call trace:
  rte_eal_init()
rte_eal_memory_init()
  rte_eal_hugepage_init()
eal_dynmem_hugepage_init()
  eal_memalloc_alloc_seg_bulk()
alloc_seg()
  mmap()

The largest part of the time spent in mmap() is filling the memory
with zeros. Kernel does so to prevent data leakage from a process
that was last using the page. However, in a controlled environment
it may not be the issue, while performance is. (Linux-specific
MAP_UNINITIALIZED flag allows mapping without clearing, but it is
disabled in all popular distributions for the reason above.)

It is proposed to add a new EAL option: --mem-file FILE1,FILE2,...
to map hugepages "as is" from specified FILEs in hugetlbfs.
Compared to using external memory for the task, EAL option requires
no change to application code, while allowing administrator
to control hugepage sizes and their NUMA affinity.

Limitations of the feature:

* Linux-specific (only Linux maps hugepages from files).
* Incompatible with --legacy-mem (partially replaces it).
* Incompatible with --single-file-segments
  (--mem-file FILEs can contain as many segments as needed).
* Incompatible with --in-memory (logically).

A warning about possible security implications is printed
when --mem-file is used.

Until this patch DPDK allocator always cleared memory on freeing,
so that it did not have to do that on allocation, while new memory
was cleared by the kernel. When --mem-file is in use, DPDK clears memory
after allocation in rte_zmalloc() and does not clean it on freeing.
Effectively user trades fast startup for occasional allocation slowdown
whenever it is absolutely necessary. When memory is recycled, it is
cleared again, which is suboptimal par se, but saves complication
of memory management.

Signed-off-by: Viacheslav Ovsiienko 
Signed-off-by: Dmitry Kozlyuk 
---
 doc/guides/linux_gsg/linux_eal_parameters.rst |  17 +
 lib/eal/common/eal_common_dynmem.c|   6 +
 lib/eal/common/eal_common_options.c   |  23 ++
 lib/eal/common/eal_internal_cfg.h |   4 +
 lib/eal/common/eal_memalloc.h |   8 +-
 lib/eal/common/eal_options.h  |   2 +
 lib/eal/common/malloc_elem.c  |   5 +
 lib/eal/common/malloc_heap.h  |   8 +
 lib/eal/common/rte_malloc.c   |  16 +-
 lib/eal/include/rte_memory.h  |   4 +-
 lib/eal/linux/eal.c   |  28 ++
 lib/eal/linux/eal_hugepage_info.c |   5 +
 lib/eal/linux/eal_memalloc.c  | 328 +-
 13 files changed, 441 insertions(+), 13 deletions(-)

diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst 
b/doc/guides/linux_gsg/linux_eal_parameters.rst
index bd3977cb3d..b465feaea8 100644
--- a/doc/guides/linux_gsg/linux_eal_parameters.rst
+++ b/doc/guides/linux_gsg/linux_eal_parameters.rst
@@ -92,6 +92,23 @@ Memory-related options
 
 Free hugepages back to system exactly as they were originally allocated.
 
+*   ``--mem-file ``
+
+Use memory from pre-allocated files in ``hugetlbfs`` without clearing it;
+when this memory is exhausted, switch to default dynamic allocation.
+This speeds up startup compared to ``--legacy-mem`` while also avoiding
+later delays for allocating new hugepages. One downside is slowdown
+of all zeroed memory allocations. Security warning: an application
+can access contents left by previous users of hugepages. Multiple files
+can be pre-allocated in ``hugetlbfs`` with different page sizes,
+on desired NUMA nodes, using ``mount`` options and ``numactl``:
+
+--mem-file /mnt/huge-1G/node0,/mnt/huge-1G/node1,/mnt/huge-2M/extra
+
+This option is incompatible with ``--legacy-mem``, ``--in-memory``,
+and ``--single-file-segments``. Primary and secondary processes
+must specify exactly the same list of files.
+
 Other options
 ~
 
diff --git a/lib/eal/common/eal_common_dynmem.c 
b/lib/eal/common/eal_common_dynmem.c
index 7c5437ddfa..abcf22f097 100644
--- a/lib/eal/common/eal_common_dynmem.c
+++ b/lib/eal/common/eal_common_dynmem.c
@@ -272,6 +272,12 @@ eal_dynmem_hugepage_init(void)
internal_conf->num_hugepage_sizes) < 0)
return -1;
 
+#ifdef RTE_EXEC_ENV_LINUX
+   /* pre-allocate pages from --mem-file option files */
+   if (eal_memalloc_memfile_alloc(used_hp) < 0)
+   return -1;
+#endif
+
for (hp_sz_idx = 0;
hp_sz_idx < (int)internal_conf->num_hugepage_sizes;
hp_sz_idx++) {
diff --git a/lib/eal/common/eal_common_opt

[dpdk-dev] [PATCH v3 3/3] app/test: add allocator performance autotest

2021-09-14 Thread Dmitry Kozlyuk
Memory allocator performance is crucial to applications that deal
with large amount of memory or allocate frequently. DPDK allocator
performance is affected by EAL options, API used and, at least,
allocation size. New autotest is intended to be run with different
EAL options. It measures performance with a range of sizes
for dirrerent APIs: rte_malloc, rte_zmalloc, and rte_memzone_reserve.

Work distribution between allocation and deallocation depends on EAL
options. The test prints both times and total time to ease comparison.

Memory can be filled with zeroes at different points of allocation path,
but it always takes considerable fraction of overall timing. This is why
the test measures filling speed and prints how long clearing would take
for each size as a hint.

Signed-off-by: Dmitry Kozlyuk 
Reviewed-by: Viacheslav Ovsiienko 
---
 app/test/meson.build|   2 +
 app/test/test_malloc_perf.c | 157 
 2 files changed, 159 insertions(+)
 create mode 100644 app/test/test_malloc_perf.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686ad..a48dc79463 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -84,6 +84,7 @@ test_sources = files(
 'test_lpm6_perf.c',
 'test_lpm_perf.c',
 'test_malloc.c',
+'test_malloc_perf.c',
 'test_mbuf.c',
 'test_member.c',
 'test_member_perf.c',
@@ -281,6 +282,7 @@ fast_tests = [
 
 perf_test_names = [
 'ring_perf_autotest',
+'malloc_perf_autotest',
 'mempool_perf_autotest',
 'memcpy_perf_autotest',
 'hash_perf_autotest',
diff --git a/app/test/test_malloc_perf.c b/app/test/test_malloc_perf.c
new file mode 100644
index 00..4435894095
--- /dev/null
+++ b/app/test/test_malloc_perf.c
@@ -0,0 +1,157 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+typedef void * (alloc_t)(const char *name, size_t size, unsigned int align);
+typedef void (free_t)(void *addr);
+
+static const uint64_t KB = 1 << 10;
+static const uint64_t GB = 1 << 30;
+
+static double
+tsc_to_us(uint64_t tsc, size_t runs)
+{
+   return (double)tsc / rte_get_tsc_hz() * US_PER_S / runs;
+}
+
+static int
+test_memset_perf(double *us_per_gb)
+{
+   static const size_t RUNS = 20;
+
+   void *ptr;
+   size_t i;
+   uint64_t tsc;
+
+   puts("Performance: memset");
+
+   ptr = rte_malloc(NULL, GB, 0);
+   if (ptr == NULL) {
+   printf("rte_malloc(size=%"PRIx64") failed\n", GB);
+   return -1;
+   }
+
+   tsc = rte_rdtsc_precise();
+   for (i = 0; i < RUNS; i++)
+   memset(ptr, 0, GB);
+   tsc = rte_rdtsc_precise() - tsc;
+
+   *us_per_gb = tsc_to_us(tsc, RUNS);
+   printf("Result: %f.3 GiB/s <=> %.2f us/MiB\n",
+   US_PER_S / *us_per_gb, *us_per_gb / KB);
+
+   rte_free(ptr);
+   putchar('\n');
+   return 0;
+}
+
+static int
+test_alloc_perf(const char *name, alloc_t *alloc_fn, free_t free_fn,
+   size_t max_runs, double memset_gb_us)
+{
+   static const size_t SIZES[] = {
+   1 << 6, 1 << 7, 1 << 10, 1 << 12, 1 << 16, 1 << 20,
+   1 << 21, 1 << 22, 1 << 24, 1 << 30 };
+
+   size_t i, j;
+   void **ptrs;
+
+   printf("Performance: %s\n", name);
+
+   ptrs = calloc(max_runs, sizeof(ptrs[0]));
+   if (ptrs == NULL) {
+   puts("Cannot allocate memory for pointers");
+   return -1;
+   }
+
+   printf("%12s%8s%12s%12s%12s%12s\n",
+   "Size (B)", "Runs", "Alloc (us)", "Free (us)",
+   "Total (us)", "memset (us)");
+   for (i = 0; i < RTE_DIM(SIZES); i++) {
+   size_t size = SIZES[i];
+   size_t runs_done;
+   uint64_t tsc_start, tsc_alloc, tsc_free;
+   double alloc_time, free_time, memset_time;
+
+   tsc_start = rte_rdtsc_precise();
+   for (j = 0; j < max_runs; j++) {
+   ptrs[j] = alloc_fn(NULL, size, 0);
+   if (ptrs[j] == NULL)
+   break;
+   }
+   tsc_alloc = rte_rdtsc_precise() - tsc_start;
+
+   if (j == 0) {
+   printf("%12zu Interrupted: out of memory.\n", size);
+   break;
+   }
+   runs_done = j;
+
+   tsc_start = rte_rdtsc_precise();
+   for (j = 0; j < runs_done && ptrs[j] != NULL; j++)
+   free_fn(ptrs[j]);
+   tsc_free = rte_rdtsc_precise() - tsc_start;
+
+   alloc_time = tsc_to_us(tsc_alloc, runs_done);
+   free_time = tsc_to_us(tsc_free, runs_done);
+   memset_time = memset_gb_us * size / GB;
+   printf("%12zu%8zu%12.2f%12.2f%12.2f%12.2f\n",
+   size, runs_done

[dpdk-dev] [Bug 809] KNI request overwritten with new asynchronous kni_net_release mechanism

2021-09-14 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=809

Bug ID: 809
   Summary: KNI request overwritten with new asynchronous
kni_net_release mechanism
   Product: DPDK
   Version: 21.11
  Hardware: x86
OS: Linux
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: ercli...@gmail.com
  Target Milestone: ---

With the new asynchronous KNI request for kni_net_release() there is a
potential for an interface down request to be overwritten in the KNI request
fifo.  The issue occurs when an application sets an interface down, immediately
followed by setting the interface up.


The down request gets put on the KNI fifo and then returns immediately (inside
kni_net_process_request) .  If a subsequent up request comes in before the down
request has had a chance to be processed by the request handler, the new up
request gets written to the same KNI sync address, thus overwriting the down
request and the first down request will be missed:

/* Construct data */
memcpy(kni->sync_kva, req, sizeof(struct rte_kni_request)); <--
Overwritten
num = kni_fifo_put(kni->req_q, &kni->sync_va, 1);

Before this asynchronous mechanism was introduced this was prevented by taking
the kni->sync_lock and waiting for the response.  But even in this case if the
wait timeout was exceeded, this same type of condition could theoretically
happen.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[dpdk-dev] [PATCH v5 0/3] security: Improve inline fast path routines

2021-09-14 Thread Nithin Dabilpuram
Improvements to Inline inbound and outbound processing fast path routines
rte_security_set_pkt_metadata() and rte_security_get_userdata() to make
them inline functions and also provide mechanism for drivers to support
fast userdata and metadata access instead of driver specific per-pkt
function callbacks.

This series updates requirements of mbuf fields to be updated for outbound
inline processing.

Nithin Dabilpuram (3):
  security: enforce semantics for Tx inline processing
  security: add option for faster udata or mdata access
  examples/ipsec-secgw: update event mode inline path

v5:
- Squash 4/4 patch to 2/4 and update release notes

v4:
- Removed entry from deprecation notice.
- Fixed issue with rte_security_set_pkt_metadata() to pass instance instead
  of device ptr to non-inline C function.

v3:
- Rebased and fixed compilation issue with rte_security_get_userdata() on
  32-bit platform
- Updated l2_len on patch 3/3 only for outbound.

v2:
- Remove restrictions on rte_security_set_pkt_metadata() w.r.t pkt content
- Add inline functions for rte_security_set_pkt_metadata() and 
  rte_security_get_userdata() and also faster mdata, udata access via
  patch 2/3

 doc/guides/nics/features.rst   |  2 ++
 doc/guides/rel_notes/deprecation.rst   |  4 ---
 doc/guides/rel_notes/release_21_08.rst |  6 +
 examples/ipsec-secgw/ipsec-secgw.c |  2 ++
 examples/ipsec-secgw/ipsec_worker.c| 41 +++--
 lib/mbuf/rte_mbuf_core.h   |  2 ++
 lib/security/rte_security.c|  8 +++---
 lib/security/rte_security.h| 48 +++---
 lib/security/version.map   |  2 ++
 9 files changed, 89 insertions(+), 26 deletions(-)

-- 
2.8.4



[dpdk-dev] [PATCH v5 1/3] security: enforce semantics for Tx inline processing

2021-09-14 Thread Nithin Dabilpuram
Not all net PMD's/HW can parse packet and identify L2 header and
L3 header locations on Tx. This is inline with other Tx offloads
requirements such as L3 checksum, L4 checksum offload, etc,
where mbuf.l2_len, mbuf.l3_len etc, needs to be set for HW to be
able to generate checksum. Since Inline IPSec is also such a Tx
offload, some PMD's at least need mbuf.l2_len to be valid to
find L3 header and perform Outbound IPSec processing.

Hence, this patch updates documentation to enforce setting
mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
for Inline IPSec Crypto / Protocol offload processing to
work on Tx.

Signed-off-by: Nithin Dabilpuram 
Acked-by: Akhil Goyal 
---
 doc/guides/nics/features.rst | 2 ++
 lib/mbuf/rte_mbuf_core.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index a96e12d..4fce8cd 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD 
documentation for more deta
 
 * **[uses]   rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_SECURITY``,
 * **[uses]   rte_eth_txconf,rte_eth_txmode**: 
``offloads:DEV_TX_OFFLOAD_SECURITY``.
+* **[uses]   mbuf**: ``mbuf.l2_len``.
 * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
   ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, 
``capabilities_get``.
 * **[provides] rte_eth_dev_info**: 
``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
@@ -451,6 +452,7 @@ protocol operations. See security library and PMD 
documentation for more details
 
 * **[uses]   rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_SECURITY``,
 * **[uses]   rte_eth_txconf,rte_eth_txmode**: 
``offloads:DEV_TX_OFFLOAD_SECURITY``.
+* **[uses]   mbuf**: ``mbuf.l2_len``.
 * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
   ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, 
``get_userdata``,
   ``capabilities_get``.
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index bb38d7f..9d8e3dd 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -228,6 +228,8 @@ extern "C" {
 
 /**
  * Request security offload processing on the TX packet.
+ * To use Tx security offload, the user needs to fill l2_len in mbuf
+ * indicating L2 header size and where L3 header starts.
  */
 #define PKT_TX_SEC_OFFLOAD (1ULL << 43)
 
-- 
2.8.4



[dpdk-dev] [PATCH v5 2/3] security: add option for faster udata or mdata access

2021-09-14 Thread Nithin Dabilpuram
Currently rte_security_set_pkt_metadata() and rte_security_get_userdata()
methods to set pkt metadata on Inline outbound and get userdata
after Inline inbound processing is always driver specific callbacks.

For drivers that do not have much to do in the callbacks but just
to update metadata in rte_security dynamic field and get userdata
from rte_security dynamic field, having to just to PMD specific
callback is costly per packet operation. This patch provides
a mechanism to do the same in inline function and avoid function
pointer jump if a driver supports the same.

Signed-off-by: Nithin Dabilpuram 
Acked-by: Akhil Goyal 
---
 doc/guides/rel_notes/deprecation.rst   |  4 ---
 doc/guides/rel_notes/release_21_08.rst |  6 +
 lib/security/rte_security.c|  8 +++---
 lib/security/rte_security.h| 48 +++---
 lib/security/version.map   |  2 ++
 5 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 59445a6..70ef45e 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -276,10 +276,6 @@ Deprecation Notices
   content. On Linux and FreeBSD, supported prior to DPDK 20.11,
   original structure will be kept until DPDK 21.11.
 
-* security: The functions ``rte_security_set_pkt_metadata`` and
-  ``rte_security_get_userdata`` will be made inline functions and additional
-  flags will be added in structure ``rte_security_ctx`` in DPDK 21.11.
-
 * cryptodev: The structure ``rte_crypto_op`` would be updated to reduce
   reserved bytes to 2 (from 3), and use 1 byte to indicate warnings and other
   information from the crypto/security operation. This field will be used to
diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index b4cbf2d..59ff15a 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -223,6 +223,12 @@ ABI Changes
 
 * No ABI change that would break compatibility with 20.11.
 
+* security: ``rte_security_set_pkt_metadata`` and ``rte_security_get_userdata``
+  routines used by Inline outbound and Inline inbound security processing are
+  made inline and enhanced to do simple 64-bit set/get for PMD's that donot
+  have much processing in PMD specific callbacks but just 64-bit set/get.
+  This avoids a per-pkt function pointer jump overhead for such PMD's.
+
 
 Known Issues
 
diff --git a/lib/security/rte_security.c b/lib/security/rte_security.c
index e8116d5..fe81ed3 100644
--- a/lib/security/rte_security.c
+++ b/lib/security/rte_security.c
@@ -122,9 +122,9 @@ rte_security_session_destroy(struct rte_security_ctx 
*instance,
 }
 
 int
-rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
- struct rte_security_session *sess,
- struct rte_mbuf *m, void *params)
+__rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
+   struct rte_security_session *sess,
+   struct rte_mbuf *m, void *params)
 {
 #ifdef RTE_DEBUG
RTE_PTR_OR_ERR_RET(sess, -EINVAL);
@@ -137,7 +137,7 @@ rte_security_set_pkt_metadata(struct rte_security_ctx 
*instance,
 }
 
 void *
-rte_security_get_userdata(struct rte_security_ctx *instance, uint64_t md)
+__rte_security_get_userdata(struct rte_security_ctx *instance, uint64_t md)
 {
void *userdata = NULL;
 
diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
index 2e136d7..3124134 100644
--- a/lib/security/rte_security.h
+++ b/lib/security/rte_security.h
@@ -71,8 +71,18 @@ struct rte_security_ctx {
/**< Pointer to security ops for the device */
uint16_t sess_cnt;
/**< Number of sessions attached to this context */
+   uint32_t flags;
+   /**< Flags for security context */
 };
 
+#define RTE_SEC_CTX_F_FAST_SET_MDATA 0x0001
+/**< Driver uses fast metadata update without using driver specific callback */
+
+#define RTE_SEC_CTX_F_FAST_GET_UDATA 0x0002
+/**< Driver provides udata using fast method without using driver specific
+ * callback.
+ */
+
 /**
  * IPSEC tunnel parameters
  *
@@ -494,6 +504,12 @@ static inline bool 
rte_security_dynfield_is_registered(void)
return rte_security_dynfield_offset >= 0;
 }
 
+/** Function to call PMD specific function pointer set_pkt_metadata() */
+__rte_experimental
+extern int __rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
+  struct rte_security_session *sess,
+  struct rte_mbuf *m, void *params);
+
 /**
  *  Updates the buffer with device-specific defined metadata
  *
@@ -507,10 +523,26 @@ static inline bool 
rte_security_dynfield_is_registered(void)
  *  - On success, zero.
  *  - On failure, a negative value.
  */
-int
+static inline int
 rte_security_set_p

[dpdk-dev] [PATCH v5 3/3] examples/ipsec-secgw: update event mode inline path

2021-09-14 Thread Nithin Dabilpuram
Update mbuf.l2_len with L2 header size for outbound
inline processing.

This patch also fixes a bug in arg parsing.

Signed-off-by: Nithin Dabilpuram 
Acked-by: Akhil Goyal 
---
 examples/ipsec-secgw/ipsec-secgw.c  |  2 ++
 examples/ipsec-secgw/ipsec_worker.c | 41 -
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/examples/ipsec-secgw/ipsec-secgw.c 
b/examples/ipsec-secgw/ipsec-secgw.c
index f252d34..7ad94cb 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -1495,6 +1495,8 @@ parse_portmask(const char *portmask)
char *end = NULL;
unsigned long pm;
 
+   errno = 0;
+
/* parse hexadecimal string */
pm = strtoul(portmask, &end, 16);
if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0'))
diff --git a/examples/ipsec-secgw/ipsec_worker.c 
b/examples/ipsec-secgw/ipsec_worker.c
index 647e22d..c545497 100644
--- a/examples/ipsec-secgw/ipsec_worker.c
+++ b/examples/ipsec-secgw/ipsec_worker.c
@@ -12,6 +12,11 @@
 #include "ipsec-secgw.h"
 #include "ipsec_worker.h"
 
+struct port_drv_mode_data {
+   struct rte_security_session *sess;
+   struct rte_security_ctx *ctx;
+};
+
 static inline enum pkt_type
 process_ipsec_get_pkt_type(struct rte_mbuf *pkt, uint8_t **nlp)
 {
@@ -60,7 +65,8 @@ ipsec_event_pre_forward(struct rte_mbuf *m, unsigned int 
port_id)
 
 static inline void
 prepare_out_sessions_tbl(struct sa_ctx *sa_out,
-   struct rte_security_session **sess_tbl, uint16_t size)
+struct port_drv_mode_data *data,
+uint16_t size)
 {
struct rte_ipsec_session *pri_sess;
struct ipsec_sa *sa;
@@ -95,9 +101,10 @@ prepare_out_sessions_tbl(struct sa_ctx *sa_out,
}
 
/* Use only first inline session found for a given port */
-   if (sess_tbl[sa->portid])
+   if (data[sa->portid].sess)
continue;
-   sess_tbl[sa->portid] = pri_sess->security.ses;
+   data[sa->portid].sess = pri_sess->security.ses;
+   data[sa->portid].ctx = pri_sess->security.ctx;
}
 }
 
@@ -356,9 +363,8 @@ process_ipsec_ev_outbound(struct ipsec_ctx *ctx, struct 
route_table *rt,
goto drop_pkt_and_exit;
}
 
-   if (sess->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA)
-   *(struct rte_security_session **)rte_security_dynfield(pkt) =
-   sess->security.ses;
+   rte_security_set_pkt_metadata(sess->security.ctx,
+ sess->security.ses, pkt, NULL);
 
/* Mark the packet for Tx security offload */
pkt->ol_flags |= PKT_TX_SEC_OFFLOAD;
@@ -367,6 +373,9 @@ process_ipsec_ev_outbound(struct ipsec_ctx *ctx, struct 
route_table *rt,
port_id = sa->portid;
 
 send_pkt:
+   /* Provide L2 len for Outbound processing */
+   pkt->l2_len = RTE_ETHER_HDR_LEN;
+
/* Update mac addresses */
update_mac_addrs(pkt, port_id);
 
@@ -398,7 +407,7 @@ static void
 ipsec_wrkr_non_burst_int_port_drv_mode(struct eh_event_link_info *links,
uint8_t nb_links)
 {
-   struct rte_security_session *sess_tbl[RTE_MAX_ETHPORTS] = { NULL };
+   struct port_drv_mode_data data[RTE_MAX_ETHPORTS];
unsigned int nb_rx = 0;
struct rte_mbuf *pkt;
struct rte_event ev;
@@ -412,6 +421,8 @@ ipsec_wrkr_non_burst_int_port_drv_mode(struct 
eh_event_link_info *links,
return;
}
 
+   memset(&data, 0, sizeof(struct port_drv_mode_data));
+
/* Get core ID */
lcore_id = rte_lcore_id();
 
@@ -422,8 +433,8 @@ ipsec_wrkr_non_burst_int_port_drv_mode(struct 
eh_event_link_info *links,
 * Prepare security sessions table. In outbound driver mode
 * we always use first session configured for a given port
 */
-   prepare_out_sessions_tbl(socket_ctx[socket_id].sa_out, sess_tbl,
-   RTE_MAX_ETHPORTS);
+   prepare_out_sessions_tbl(socket_ctx[socket_id].sa_out, data,
+RTE_MAX_ETHPORTS);
 
RTE_LOG(INFO, IPSEC,
"Launching event mode worker (non-burst - Tx internal port - "
@@ -460,19 +471,21 @@ ipsec_wrkr_non_burst_int_port_drv_mode(struct 
eh_event_link_info *links,
 
if (!is_unprotected_port(port_id)) {
 
-   if (unlikely(!sess_tbl[port_id])) {
+   if (unlikely(!data[port_id].sess)) {
rte_pktmbuf_free(pkt);
continue;
}
 
/* Save security session */
-   if (rte_security_dynfield_is_registered())
-   *(struct rte_security_session **)
-   rte_security_dynfield(pkt) =
-  

Re: [dpdk-dev] [PATCH v2] telemetry: add support for dicts of dicts

2021-09-14 Thread Power, Ciara
Hi Radu,

Thanks for adding the test, one more comment inline that I have just noticed.


>-Original Message-
>From: Nicolau, Radu 
>Sent: Friday 10 September 2021 12:28
>To: Power, Ciara 
>Cc: dev@dpdk.org; Richardson, Bruce ; Nicolau,
>Radu ; Doherty, Declan
>
>Subject: [PATCH v2] telemetry: add support for dicts of dicts
>
>Add support for dicts of dicts to telemetry library.
>
>Signed-off-by: Declan Doherty 
>Signed-off-by: Radu Nicolau 
>---
> app/test/test_telemetry_data.c | 29 +++
> lib/telemetry/telemetry.c  | 43 +++---
> lib/telemetry/telemetry_data.c |  3 ++-
> 3 files changed, 71 insertions(+), 4 deletions(-)
>
>diff --git a/app/test/test_telemetry_data.c b/app/test/test_telemetry_data.c
>index f34d691265..18b93db8ef 100644
>--- a/app/test/test_telemetry_data.c
>+++ b/app/test/test_telemetry_data.c
>@@ -200,6 +200,34 @@ test_dict_with_array_string_values(void)
>   "[\"\"]}}");
> }
>
>+static int
>+test_dict_with_dict_values(void)
>+{
>+  struct rte_tel_data *dict_of_dicts = rte_tel_data_alloc();
>+  rte_tel_data_start_dict(dict_of_dicts);
>+
>+  struct rte_tel_data *child_data = rte_tel_data_alloc();
>+  rte_tel_data_start_array(child_data, RTE_TEL_STRING_VAL);
>+
>+  struct rte_tel_data *child_data2 = rte_tel_data_alloc();
>+  rte_tel_data_start_array(child_data2, RTE_TEL_STRING_VAL);
>+
>+  memset(&response_data, 0, sizeof(response_data));
>+  rte_tel_data_start_dict(&response_data);
>+
>+  rte_tel_data_add_array_string(child_data, "");
>+  rte_tel_data_add_array_string(child_data2, "");
>+  rte_tel_data_add_dict_container(dict_of_dicts, "dict_0",
>+  child_data, 0);
>+  rte_tel_data_add_dict_container(dict_of_dicts, "dict_1",
>+  child_data2, 0);
>+  rte_tel_data_add_dict_container(&response_data, "dict_of_dicts",
>+  dict_of_dicts, 0);
>+
>+  return TEST_OUTPUT("{\"/test\":{\"dict_of_dicts\":{\"dict_0\":"
>+  "[\"\"],\"dict_1\":[\"\"]}}}");
>+}
>+
> static int
> test_array_with_array_string_values(void)
> {
>@@ -355,6 +383,7 @@ test_telemetry_data(void)
>   test_dict_with_array_int_values,
>   test_dict_with_array_u64_values,
>   test_dict_with_array_string_values,
>+  test_dict_with_dict_values,
>   test_array_with_array_int_values,
>   test_array_with_array_u64_values,
>   test_array_with_array_string_values }; diff --git
>a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c index
>8665db8d03..3f83476112 100644
>--- a/lib/telemetry/telemetry.c
>+++ b/lib/telemetry/telemetry.c
>@@ -24,7 +24,7 @@
> #include "telemetry_internal.h"
>
> #define MAX_CMD_LEN 56
>-#define MAX_HELP_LEN 64
>+#define MAX_HELP_LEN 128

This change will not do much - it will allow a longer help text to be given for 
the command on registration,
but when the user actually asks for help text for a command, there is a 
restriction on the size of the string value that is added to the dict reply,
which will truncate the help text:

In telemetry_data.c/rte_tel_data_add_dict_string:

vbytes = strlcpy(e->value.sval, val, RTE_TEL_MAX_STRING_LEN);

where RTE_TEL_MAX_STRING_LEN is 64

Maybe we could just increase RTE_TEL_MAX_STRING_LEN to 128 and replace use of 
MAX_HELP_LEN with that, to keep them aligned.

Thanks, 
Ciara



Re: [dpdk-dev] [PATCH v2] app/testpmd: add command to print representor info

2021-09-14 Thread Ferruh Yigit
On 8/31/2021 5:12 PM, Andrew Rybchenko wrote:
> From: Viacheslav Galaktionov 
> 
> Make it simpler to debug configurations and code related to the representor
> info API.
> 
> Signed-off-by: Viacheslav Galaktionov 
> Signed-off-by: Andrew Rybchenko 
> Reviewed-by: Andy Moreton 
> ---
> v2:
> - change output format to log just one line per range
> 
>  app/test-pmd/cmdline.c | 135 +
>  1 file changed, 135 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 82253bc751..ae700f9dd1 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -236,6 +236,10 @@ static void cmd_help_long_parsed(void *parsed_result,
>   "Show port supported ptypes"
>   " for a specific port\n\n"
>  
> + "show port (port_id) representor info\n"
> + "Show supported representors"
> + " for a specific port\n\n"
> +

What do you think extending existing "show port info #" command instead of
creating a new command for it?

Since "show port info #" is a well known command, it can simplify the usage.
When port is representor port it can display additional info.



[dpdk-dev] [PATCH v3] telemetry: add support for dicts of dicts

2021-09-14 Thread Radu Nicolau
Add support for dicts of dicts to telemetry library.
Increase the max string size to 128.

Signed-off-by: Declan Doherty 
Signed-off-by: Radu Nicolau 
---
 app/test/test_telemetry_data.c | 29 
 lib/telemetry/rte_telemetry.h  |  2 +-
 lib/telemetry/telemetry.c  | 48 +-
 lib/telemetry/telemetry_data.c |  3 ++-
 4 files changed, 74 insertions(+), 8 deletions(-)

diff --git a/app/test/test_telemetry_data.c b/app/test/test_telemetry_data.c
index f34d691265..18b93db8ef 100644
--- a/app/test/test_telemetry_data.c
+++ b/app/test/test_telemetry_data.c
@@ -200,6 +200,34 @@ test_dict_with_array_string_values(void)
"[\"\"]}}");
 }
 
+static int
+test_dict_with_dict_values(void)
+{
+   struct rte_tel_data *dict_of_dicts = rte_tel_data_alloc();
+   rte_tel_data_start_dict(dict_of_dicts);
+
+   struct rte_tel_data *child_data = rte_tel_data_alloc();
+   rte_tel_data_start_array(child_data, RTE_TEL_STRING_VAL);
+
+   struct rte_tel_data *child_data2 = rte_tel_data_alloc();
+   rte_tel_data_start_array(child_data2, RTE_TEL_STRING_VAL);
+
+   memset(&response_data, 0, sizeof(response_data));
+   rte_tel_data_start_dict(&response_data);
+
+   rte_tel_data_add_array_string(child_data, "");
+   rte_tel_data_add_array_string(child_data2, "");
+   rte_tel_data_add_dict_container(dict_of_dicts, "dict_0",
+   child_data, 0);
+   rte_tel_data_add_dict_container(dict_of_dicts, "dict_1",
+   child_data2, 0);
+   rte_tel_data_add_dict_container(&response_data, "dict_of_dicts",
+   dict_of_dicts, 0);
+
+   return TEST_OUTPUT("{\"/test\":{\"dict_of_dicts\":{\"dict_0\":"
+   "[\"\"],\"dict_1\":[\"\"]}}}");
+}
+
 static int
 test_array_with_array_string_values(void)
 {
@@ -355,6 +383,7 @@ test_telemetry_data(void)
test_dict_with_array_int_values,
test_dict_with_array_u64_values,
test_dict_with_array_string_values,
+   test_dict_with_dict_values,
test_array_with_array_int_values,
test_array_with_array_u64_values,
test_array_with_array_string_values };
diff --git a/lib/telemetry/rte_telemetry.h b/lib/telemetry/rte_telemetry.h
index 8776998b54..9d1bdb2e0e 100644
--- a/lib/telemetry/rte_telemetry.h
+++ b/lib/telemetry/rte_telemetry.h
@@ -11,7 +11,7 @@
 #define _RTE_TELEMETRY_H_
 
 /** Maximum length for string used in object. */
-#define RTE_TEL_MAX_STRING_LEN 64
+#define RTE_TEL_MAX_STRING_LEN 128
 /** Maximum length of string. */
 #define RTE_TEL_MAX_SINGLE_STRING_LEN 8192
 /** Maximum number of dictionary entries. */
diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index 8665db8d03..8304fbf6e9 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -24,7 +24,6 @@
 #include "telemetry_internal.h"
 
 #define MAX_CMD_LEN 56
-#define MAX_HELP_LEN 64
 #define MAX_OUTPUT_LEN (1024 * 16)
 #define MAX_CONNECTIONS 10
 
@@ -36,7 +35,7 @@ client_handler(void *socket);
 struct cmd_callback {
char cmd[MAX_CMD_LEN];
telemetry_cb fn;
-   char help[MAX_HELP_LEN];
+   char help[RTE_TEL_MAX_STRING_LEN];
 };
 
 #ifndef RTE_EXEC_ENV_WINDOWS
@@ -75,7 +74,7 @@ rte_telemetry_register_cmd(const char *cmd, telemetry_cb fn, 
const char *help)
int i = 0;
 
if (strlen(cmd) >= MAX_CMD_LEN || fn == NULL || cmd[0] != '/'
-   || strlen(help) >= MAX_HELP_LEN)
+   || strlen(help) >= RTE_TEL_MAX_STRING_LEN)
return -EINVAL;
 
rte_spinlock_lock(&callback_sl);
@@ -95,7 +94,7 @@ rte_telemetry_register_cmd(const char *cmd, telemetry_cb fn, 
const char *help)
 
strlcpy(callbacks[i].cmd, cmd, MAX_CMD_LEN);
callbacks[i].fn = fn;
-   strlcpy(callbacks[i].help, help, MAX_HELP_LEN);
+   strlcpy(callbacks[i].help, help, RTE_TEL_MAX_STRING_LEN);
num_callbacks++;
rte_spinlock_unlock(&callback_sl);
 
@@ -157,8 +156,8 @@ container_to_json(const struct rte_tel_data *d, char 
*out_buf, size_t buf_len)
size_t used = 0;
unsigned int i;
 
-   if (d->type != RTE_TEL_ARRAY_U64 && d->type != RTE_TEL_ARRAY_INT
-   && d->type != RTE_TEL_ARRAY_STRING)
+   if (d->type != RTE_TEL_DICT && d->type != RTE_TEL_ARRAY_U64 &&
+   d->type != RTE_TEL_ARRAY_INT && d->type != RTE_TEL_ARRAY_STRING)
return snprintf(out_buf, buf_len, "null");
 
used = rte_tel_json_empty_array(out_buf, buf_len, 0);
@@ -177,6 +176,43 @@ container_to_json(const struct rte_tel_data *d, char 
*out_buf, size_t buf_len)
used = rte_tel_json_add_array_string(out_buf,
buf_len, used,
d->data.array[i].sval);
+ 

Re: [dpdk-dev] [PATCH] ethdev: remove experimental flag from getting intr fd API

2021-09-14 Thread Ferruh Yigit
On 9/1/2021 12:17 PM, Kinsella, Ray wrote:
> 
> 
> On 01/09/2021 09:53, Ferruh Yigit wrote:
>> On 9/1/2021 8:08 AM, Andrew Rybchenko wrote:
>>> On 9/1/21 4:50 AM, Xiaoyun Li wrote:
 Remove the experimental tag for rte_eth_dev_rx_intr_ctl_q_get_fd API
 that was introduced in 18.11 and have been around for 11 releases.

 Signed-off-by: Xiaoyun Li 
>>>
>>> Acked-by: Andrew Rybchenko 
>>>
>>
>> Acked-by: Ferruh Yigit 
>>
> Acked-by: Ray Kinsella 
> 

Applied to dpdk-next-net/main, thanks.


Re: [dpdk-dev] [PATCH v2] app/testpmd: add command to print representor info

2021-09-14 Thread Andrew Rybchenko
On 9/14/21 6:52 PM, Ferruh Yigit wrote:
> On 8/31/2021 5:12 PM, Andrew Rybchenko wrote:
>> From: Viacheslav Galaktionov 
>>
>> Make it simpler to debug configurations and code related to the representor
>> info API.
>>
>> Signed-off-by: Viacheslav Galaktionov 
>> Signed-off-by: Andrew Rybchenko 
>> Reviewed-by: Andy Moreton 
>> ---
>> v2:
>> - change output format to log just one line per range
>>
>>  app/test-pmd/cmdline.c | 135 +
>>  1 file changed, 135 insertions(+)
>>
>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
>> index 82253bc751..ae700f9dd1 100644
>> --- a/app/test-pmd/cmdline.c
>> +++ b/app/test-pmd/cmdline.c
>> @@ -236,6 +236,10 @@ static void cmd_help_long_parsed(void *parsed_result,
>>  "Show port supported ptypes"
>>  " for a specific port\n\n"
>>  
>> +"show port (port_id) representor info\n"
>> +"Show supported representors"
>> +" for a specific port\n\n"
>> +
> 
> What do you think extending existing "show port info #" command instead of
> creating a new command for it?

My fear with such approach is that output of the "show port
info #" is already too long and adding representors info
there will make it even much longer.

> Since "show port info #" is a well known command, it can simplify the usage.
> When port is representor port it can display additional info.
> 

Just to be clear: it will output information for "backer"
(or parent) port which should be used to create representors.


Re: [dpdk-dev] [PATCH v9] doc: add release milestones definition

2021-09-14 Thread Ferruh Yigit
On 9/14/2021 8:56 AM, Thomas Monjalon wrote:
> From: Asaf Penso 
> 
> Adding more information about the release milestones.
> This includes the scope of change, expectations, etc.
> 
> Signed-off-by: Asaf Penso 
> Signed-off-by: Thomas Monjalon 
> Acked-by: John McNamara 
> Acked-by: Ajit Khaparde 
> Acked-by: Bruce Richardson 
> Acked-by: Andrew Rybchenko 

Acked-by: Ferruh Yigit 

Thanks for the documentation.


Re: [dpdk-dev] [PATCH v2] app/testpmd: add command to print representor info

2021-09-14 Thread Ferruh Yigit
On 9/14/2021 5:17 PM, Andrew Rybchenko wrote:
> On 9/14/21 6:52 PM, Ferruh Yigit wrote:
>> On 8/31/2021 5:12 PM, Andrew Rybchenko wrote:
>>> From: Viacheslav Galaktionov 
>>>
>>> Make it simpler to debug configurations and code related to the representor
>>> info API.
>>>
>>> Signed-off-by: Viacheslav Galaktionov 
>>> Signed-off-by: Andrew Rybchenko 
>>> Reviewed-by: Andy Moreton 
>>> ---
>>> v2:
>>> - change output format to log just one line per range
>>>
>>>  app/test-pmd/cmdline.c | 135 +
>>>  1 file changed, 135 insertions(+)
>>>
>>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
>>> index 82253bc751..ae700f9dd1 100644
>>> --- a/app/test-pmd/cmdline.c
>>> +++ b/app/test-pmd/cmdline.c
>>> @@ -236,6 +236,10 @@ static void cmd_help_long_parsed(void *parsed_result,
>>> "Show port supported ptypes"
>>> " for a specific port\n\n"
>>>  
>>> +   "show port (port_id) representor info\n"
>>> +   "Show supported representors"
>>> +   " for a specific port\n\n"
>>> +
>>
>> What do you think extending existing "show port info #" command instead of
>> creating a new command for it?
> 
> My fear with such approach is that output of the "show port
> info #" is already too long and adding representors info
> there will make it even much longer.
> 

That is fair concern, what about extend existing command with a new keyword to
just print representor info:
"show port info # representor"

>> Since "show port info #" is a well known command, it can simplify the usage.
>> When port is representor port it can display additional info.
>>
> 
> Just to be clear: it will output information for "backer"
> (or parent) port which should be used to create representors.
> 



Re: [dpdk-dev] [PATCH v8] doc: add release milestones definition

2021-09-14 Thread Thomas Monjalon
14/09/2021 18:11, Ajit Khaparde:
> On Tue, Sep 14, 2021 at 12:53 AM Thomas Monjalon  wrote:
> > 03/09/2021 17:35, Ferruh Yigit:
> > > On 9/3/2021 12:50 PM, Thomas Monjalon wrote:
> > > > 02/09/2021 18:33, Ferruh Yigit:
> > > >> On 8/26/2021 11:11 AM, Thomas Monjalon wrote:
> > > >>> +* Any issue found in -rc1 should be fixed.
> > > >>> +
> > > >>> +rc3
> > > >>> +~~~
> > > >>> +
> > > >>> +* Priority: applications. No application feature should be accepted 
> > > >>> after -rc3.
> > > >>> +* New functionality that does not depend on libraries update
> > > >>> +  can be integrated as part of -rc3.
> > > >>> +* The application change must include documentation in the relevant 
> > > >>> .rst files
> > > >>> +  (application-specific and release notes if significant).
> > > >>> +* Libraries and drivers cleanup are allowed.
> > > >>> +* Small driver reworks.
> > > >>> +* Critical and minor bug fixes.
> > > >>
> > > >> As mentioned before, my concern is this may create false impression 
> > > >> that bugs
> > > >> are fixed only in this phase. What about remove this line completely 
> > > >> and update
> > > >> below -rc4 one as 'Critical bug fixes only.'? I think that makes 
> > > >> intention more
> > > >> clear.
> > > >
> > > > I had added in -rc2: "Any issue found in -rc1 should be fixed."
> > > > Do you want to remove it as well?
> > >
> > > I think we can keep it, good to highlight one of the major tasks for -rc2 
> > > is to
> > > fix defects found in -rc1, and it doesn't limit fixes to ones found in 
> > > -rc1.
> >
> > Actually I think it is better to remove.
> > It looks weird to have it only in -rc2.
> I see you have sent the new version and it is looking really good.
> 
> We can mention that bug fixes are welcome at any point in the cycle,

I hope it is obvious.

> but priority will be given to critical fixes close to the release date.

Yes it says critical bug fixes only.





Re: [dpdk-dev] [PATCH v9] doc: add release milestones definition

2021-09-14 Thread Thomas Monjalon
14/09/2021 18:34, Ferruh Yigit:
> On 9/14/2021 8:56 AM, Thomas Monjalon wrote:
> > From: Asaf Penso 
> > 
> > Adding more information about the release milestones.
> > This includes the scope of change, expectations, etc.
> > 
> > Signed-off-by: Asaf Penso 
> > Signed-off-by: Thomas Monjalon 
> > Acked-by: John McNamara 
> > Acked-by: Ajit Khaparde 
> > Acked-by: Bruce Richardson 
> > Acked-by: Andrew Rybchenko 
> 
> Acked-by: Ferruh Yigit 
> 
> Thanks for the documentation.

Adding this last minute note at the end:
Bug fixes are integrated as early as possible at any stage.
Is it OK?




[dpdk-dev] [PATCH] net/bnxt: fix Rx queue startup state

2021-09-14 Thread Lance Richardson
Since the addition of support for runtime queue setup,
receive queues that are started by default no longer
have the correct state. Fix this by setting the state
when a port is started.

Fixes: 0105ea1296c9 ("net/bnxt: support runtime queue setup")
Signed-off-by: Lance Richardson 
Reviewed-by: Ajit Khaparde 
Reviewed-by: Somnath Kotur 
Reviewed-by: Kalesh Anakkur Purayil 
---
 drivers/net/bnxt/bnxt_ethdev.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index d6e3847963..097dd10de9 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -482,6 +482,12 @@ static int bnxt_setup_one_vnic(struct bnxt *bp, uint16_t 
vnic_id)
rxq->vnic->fw_grp_ids[j] = INVALID_HW_RING_ID;
else
vnic->rx_queue_cnt++;
+
+   if (!rxq->rx_deferred_start) {
+   bp->eth_dev->data->rx_queue_state[j] =
+   RTE_ETH_QUEUE_STATE_STARTED;
+   rxq->rx_started = true;
+   }
}
 
PMD_DRV_LOG(DEBUG, "vnic->rx_queue_cnt = %d\n", vnic->rx_queue_cnt);
-- 
2.25.1



Re: [dpdk-dev] [PATCH v3 1/3] eal/linux: make hugetlbfs analysis reusable

2021-09-14 Thread Dmitry Kozlyuk
> -Original Message-
> From: John Levon 
> Sent: 14 сентября 2021 г. 15:48
> To: Dmitry Kozlyuk 
> Cc: dev@dpdk.org; Anatoly Burakov ; Slava
> Ovsiienko 
> Subject: Re: [PATCH v3 1/3] eal/linux: make hugetlbfs analysis reusable
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Sep 14, 2021 at 01:34:54PM +0300, Dmitry Kozlyuk wrote:
> 
> > get_hugepage_dir() searched for a hugetlbfs mount with a given page
> > size using handcraft parsing of /proc/mounts and mixing traversal
> > logic with selecting the needed entry. Separate code to enumerate
> > hugetlbfs mounts to eal_hugepage_mount_walk() taking a callback that
> > can inspect already parsed entries. Use mntent(3) API for parsing.
> > This allows to reuse enumeration logic in subsequent patches.
> 
> Hi, are you planning to implement my pending change on top of this?

Yes, that's what I have in mind after your patch will be merged.


Re: [dpdk-dev] [PATCH v9] doc: add release milestones definition

2021-09-14 Thread Thomas Monjalon
14/09/2021 18:51, Ajit Khaparde:
> On Tue, Sep 14, 2021 at 9:50 AM Thomas Monjalon  wrote:
> >
> > 14/09/2021 18:34, Ferruh Yigit:
> > > On 9/14/2021 8:56 AM, Thomas Monjalon wrote:
> > > > From: Asaf Penso 
> > > >
> > > > Adding more information about the release milestones.
> > > > This includes the scope of change, expectations, etc.
> > > >
> > > > Signed-off-by: Asaf Penso 
> > > > Signed-off-by: Thomas Monjalon 
> > > > Acked-by: John McNamara 
> > > > Acked-by: Ajit Khaparde 
> > > > Acked-by: Bruce Richardson 
> > > > Acked-by: Andrew Rybchenko 
> > >
> > > Acked-by: Ferruh Yigit 
> > >
> > > Thanks for the documentation.
> >
> > Adding this last minute note at the end:
> > Bug fixes are integrated as early as possible at any stage.
> > Is it OK?
> Looks good to me.

Applied





Re: [dpdk-dev] [PATCH] lpm6: Fix missing ^ in documentation.

2021-09-14 Thread Medvedkin, Vladimir




On 14/09/2021 09:05, David Marchand wrote:

On Mon, Sep 13, 2021 at 8:47 PM Ben Pfaff  wrote:




This is probably due to conversion from ms word to rst format.

Fixes: fc1f2750a3ec ("doc: programmers guide")
Cc: sta...@dpdk.org


Signed-off-by: Ben Pfaff 


Reviewed-by: David Marchand 



Acked-by: Vladimir Medvedkin 





--
Regards,
Vladimir


Re: [dpdk-dev] [PATCH 2/2] net/i40e: fix risk in Rx descriptor read in scalar path

2021-09-14 Thread Honnappa Nagarahalli


> 
> Rx descriptor is 16B/32B in size and consists of multiple words.
> The word that includes DD field should be read first. Read result with DD bit
> set indicates the rest part in a descriptor is valid.
Suggest rewording as follows:
Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates that the 
rest of the descriptor words have valid values. Hence, the word containing DD 
bit must be read first before reading the rest of the descriptor words.

> 
> In functions for simple Rx, the descriptor is not read atomically in whole. On
> weaker ordered systems like aarch64, read of the word that includes DD field
> could be reordered after read of other words.
> In this case, some words could be invalid data.
Since the entire descriptor is not read atomically, on relaxed memory ordered 
systems like Aarch64, read of the word containing DD field could be reordered 
after read of other words.

> 
> Read barrier is inserted between read of the word with DD field and read of
> other words. The barrier ensures what fetched is correct descriptor data.
Suggest capturing the performance impact, so it is clearly documented.

> 
> Fixes: 7b0cf70135d1 ("net/i40e: support ARM platform")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ruifeng Wang 
With the above comments,
Reviewed-by: Honnappa Nagarahalli 

> ---
> The change should not impact performance on x86 as acquire fence is ignored
> on x86.
> 
>  drivers/net/i40e/i40e_rxtx.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index
> 8329cbdd4e..c4cd6b6b60 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -746,6 +746,12 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts, uint16_t nb_pkts)
>   break;
>   }
> 
> + /**
> +  * Use acquire fence to ensure that qword1 which includes DD
> +  * bit is loaded before loading of other descriptor words.
> +  */
> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +
>   rxd = *rxdp;
>   nb_hold++;
>   rxe = &sw_ring[rx_id];
> @@ -862,6 +868,12 @@ i40e_recv_scattered_pkts(void *rx_queue,
>   break;
>   }
> 
> + /**
> +  * Use acquire fence to ensure that qword1 which includes DD
> +  * bit is loaded before loading of other descriptor words.
> +  */
> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +
>   rxd = *rxdp;
>   nb_hold++;
>   rxe = &sw_ring[rx_id];
> --
> 2.25.1



[dpdk-dev] [PATCH v2] Warns if IPv4, UDP or TCP checksum offload not available

2021-09-14 Thread Usama Nadeem
From: usamanadeem321 

Checks if IPV4, UDP and TCP Checksum offloads are available.
If not available, prints a warning message.

Bugzilla ID: 545
Signed-off-by: usamanadeem321 
---
 examples/l3fwd/main.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 00ac267af1..ae62bc570d 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -123,7 +123,6 @@ static struct rte_eth_conf port_conf = {
.mq_mode = ETH_MQ_RX_RSS,
.max_rx_pkt_len = RTE_ETHER_MAX_LEN,
.split_hdr_size = 0,
-   .offloads = DEV_RX_OFFLOAD_CHECKSUM,
},
.rx_adv_conf = {
.rss_conf = {
@@ -1039,6 +1038,27 @@ l3fwd_poll_resource_setup(void)
local_port_conf.txmode.offloads |=
DEV_TX_OFFLOAD_MBUF_FAST_FREE;
 
+   if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_IPV4_CKSUM)
+   local_port_conf.rxmode.offloads |=
+DEV_RX_OFFLOAD_IPV4_CKSUM;
+   else {
+   printf("WARNING: IPV4 Checksum offload not 
available.\n");
+   }
+
+   if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_UDP_CKSUM)
+   local_port_conf.rxmode.offloads |=
+   DEV_RX_OFFLOAD_UDP_CKSUM;
+
+   else
+   printf("WARNING: UDP Checksum offload not 
available.\n");
+
+   if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_CKSUM)
+   local_port_conf.rxmode.offloads |=
+   DEV_RX_OFFLOAD_TCP_CKSUM;
+
+   else
+   printf("WARNING: TCP Checksum offload not 
available.\n");
+
local_port_conf.rx_adv_conf.rss_conf.rss_hf &=
dev_info.flow_type_rss_offloads;
 
-- 
2.25.1



Re: [dpdk-dev] [PATCH] efd: change data type of parameter

2021-09-14 Thread David Christensen




On 9/14/21 12:10 AM, David Marchand wrote:

On Fri, Sep 10, 2021 at 6:54 PM Pablo de Lara
 wrote:


rte_efd_create() function was using uint8_t for a socket bitmask,
for one of its parameters.
This limits the maximum of NUMA sockets to be 8.
Changing to to uint64_t increases it to 64, which should be
more future-proof.


Cc: ppc maintainer, since I think powerX servers have non contiguous
NUMA sockets.


Definitely correct, POWER CPU NUMA sockets are not necessarily contiguous.

Can you update efd_autotest and efd_perf_autotest as well?  After 
applying this patch the test still fails on my POWER9 system:


$ sudo /home/drc/src/dpdk/build/app/test/dpdk-test -l 64-127 -n 4 --no-pci
...
RTE>>efd_autotest
Entering test_add_delete
EFD: At least one CPU socket must be enabled in the bitmask
EAL: Test assert test_add_delete line 125 failed: Error creating the EFD 
table


Test Failed
RTE>>

On this system lcores 64-127 reside on NUMA socket 8.

Dave


Re: [dpdk-dev] [PATCH v2] Warns if IPv4, UDP or TCP checksum offload not available

2021-09-14 Thread Stephen Hemminger
On Tue, 14 Sep 2021 23:08:27 +0500
Usama Nadeem  wrote:

> +
> + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_UDP_CKSUM)
> + local_port_conf.rxmode.offloads |=
> + DEV_RX_OFFLOAD_UDP_CKSUM;
> +
> + else
> + printf("WARNING: UDP Checksum offload not 
> available.\n");
> +
> + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_CKSUM)
> + local_port_conf.rxmode.offloads |=
> + DEV_RX_OFFLOAD_TCP_CKSUM;
> +
> + else
> + printf("WARNING: TCP Checksum offload not 
> available.\n");

Why does l3fwd care about L4 checksum offload?
The application should really be just a simple L3 router. But it
seems to have become a test for ptype and depends on TCP/UDP.


Re: [dpdk-dev] [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path

2021-09-14 Thread Honnappa Nagarahalli

Similar comments that I have to patch 2/2

> 
> Rx descriptor is 16B/32B in size and consists of multiple words.
> The word that includes DD field should be read first. Read result with DD bit
> set indicates the rest part in a descriptor is valid.
Suggest rewording as follows:
Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates that the 
rest of the descriptor words have valid values. Hence, the word containing DD 
bit must be read first before reading the rest of the descriptor words.

> 
> In NEON vector PMD, vector load loads two contiguous 8B of descriptor data
> into vector register. Given vector load ensures no 16B atomicity, read of the
> word that includes DD field could be reordered after read of other words. In
> this case, some words could be invalid data.
"some words could contain invalid data"

> 
> Read barrier is added after read of qword1 that includes DD field.
> And qword0 is reloaded to update vector register. This ensures what fetched
> is correct descriptor data.
"This ensures that the fetched data is correct".

Suggest capturing the performance impact, so it is clearly documented.
> 
> Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ruifeng Wang 
With the above comments,
Reviewed-by: Honnappa Nagarahalli 

> ---
>  drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> index b2683fda60..71191c7cc8 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> @@ -286,6 +286,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue
> *__rte_restrict rxq,
>   descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
>   descs[0] =  vld1q_u64((uint64_t *)(rxdp));
> 
> + /* Use acquire fence to order loads of descriptor qwords */
> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> + /* A.2 reload qword0 to make it ordered after qword1 load */
> + descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3],
> 0);
> + descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2],
> 0);
> + descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1],
> 0);
> + descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
> +
>   /* B.1 load 4 mbuf point */
>   mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
>   mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
> --
> 2.25.1



[dpdk-dev] [PATCH] pipeline: improve handling of learner table action arguments

2021-09-14 Thread Cristian Dumitrescu
The arguments of actions that are learned are now specified as part of
the learn instruction as opposed to being statically specified as part
of the learner table configuration.

Signed-off-by: Cristian Dumitrescu 
---
Depends-on: series-18878 ("[V3,01/24] pipeline: move data structures to
internal header file")

 examples/pipeline/examples/learner.spec  |  6 +--
 lib/pipeline/rte_swx_pipeline.c  | 55 +---
 lib/pipeline/rte_swx_pipeline.h  | 10 -
 lib/pipeline/rte_swx_pipeline_internal.h |  8 ++--
 lib/pipeline/rte_swx_pipeline_spec.c | 35 ++-
 5 files changed, 31 insertions(+), 83 deletions(-)

diff --git a/examples/pipeline/examples/learner.spec 
b/examples/pipeline/examples/learner.spec
index d635422282..4ee52da7ac 100644
--- a/examples/pipeline/examples/learner.spec
+++ b/examples/pipeline/examples/learner.spec
@@ -84,7 +84,7 @@ action learn_action args none {
// Add the current lookup key to the table with fwd_action as the key 
action. The action
// arguments are read from the packet meta-data (the 
m.fwd_action_arg_port_out field). These
// packet meta-data fields have to be written before the "learn" 
instruction is invoked.
-   learn fwd_action
+   learn fwd_action m.fwd_action_arg_port_out
 
// Send the current packet to the same output port.
mov m.port_out m.fwd_action_arg_port_out
@@ -101,9 +101,9 @@ learner fwd_table {
}
 
actions {
-   fwd_action args m.fwd_action_arg_port_out
+   fwd_action
 
-   learn_action args none
+   learn_action
}
 
default_action learn_action args none
diff --git a/lib/pipeline/rte_swx_pipeline.c b/lib/pipeline/rte_swx_pipeline.c
index 31f0029404..1cd09a4b44 100644
--- a/lib/pipeline/rte_swx_pipeline.c
+++ b/lib/pipeline/rte_swx_pipeline.c
@@ -2359,6 +2359,9 @@ action_find(struct rte_swx_pipeline *p, const char *name);
 static int
 action_has_nbo_args(struct action *a);
 
+static int
+learner_action_args_check(struct rte_swx_pipeline *p, struct action *a, const 
char *mf_name);
+
 static int
 instr_learn_translate(struct rte_swx_pipeline *p,
  struct action *action,
@@ -2368,16 +2371,31 @@ instr_learn_translate(struct rte_swx_pipeline *p,
  struct instruction_data *data __rte_unused)
 {
struct action *a;
+   const char *mf_name;
+   uint32_t mf_offset = 0;
 
CHECK(action, EINVAL);
-   CHECK(n_tokens == 2, EINVAL);
+   CHECK((n_tokens == 2) || (n_tokens == 3), EINVAL);
 
a = action_find(p, tokens[1]);
CHECK(a, EINVAL);
CHECK(!action_has_nbo_args(a), EINVAL);
 
+   mf_name = (n_tokens > 2) ? tokens[2] : NULL;
+   CHECK(!learner_action_args_check(p, a, mf_name), EINVAL);
+
+   if (mf_name) {
+   struct field *mf;
+
+   mf = metadata_field_parse(p, mf_name);
+   CHECK(mf, EINVAL);
+
+   mf_offset = mf->offset / 8;
+   }
+
instr->type = INSTR_LEARNER_LEARN;
instr->learn.action_id = a->id;
+   instr->learn.mf_offset = mf_offset;
 
return 0;
 }
@@ -8165,7 +8183,6 @@ rte_swx_pipeline_learner_config(struct rte_swx_pipeline 
*p,
CHECK(params->action_names, EINVAL);
for (i = 0; i < params->n_actions; i++) {
const char *action_name = params->action_names[i];
-   const char *action_field_name = params->action_field_names[i];
struct action *a;
uint32_t action_data_size;
 
@@ -8174,10 +8191,6 @@ rte_swx_pipeline_learner_config(struct rte_swx_pipeline 
*p,
a = action_find(p, action_name);
CHECK(a, EINVAL);
 
-   status = learner_action_args_check(p, a, action_field_name);
-   if (status)
-   return status;
-
status = learner_action_learning_check(p,
   a,
   params->action_names,
@@ -8218,10 +8231,6 @@ rte_swx_pipeline_learner_config(struct rte_swx_pipeline 
*p,
if (!l->actions)
goto nomem;
 
-   l->action_arg = calloc(params->n_actions, sizeof(struct field *));
-   if (!l->action_arg)
-   goto nomem;
-
if (action_data_size_max) {
l->default_action_data = calloc(1, action_data_size_max);
if (!l->default_action_data)
@@ -8243,14 +8252,9 @@ rte_swx_pipeline_learner_config(struct rte_swx_pipeline 
*p,
 
l->header = header;
 
-   for (i = 0; i < params->n_actions; i++) {
-   const char *mf_name = params->action_field_names[i];
-
+   for (i = 0; i < params->n_actions; i++)
l->actions[i] = action_find(p, params->action_names[i]);
 
-   l->action_arg[i] = mf_name ? metadata_field_parse(p, mf_name) : 
NULL;
-   }
-
  

Re: [dpdk-dev] [PATCH v8] doc: add release milestones definition

2021-09-14 Thread Ajit Khaparde
On Tue, Sep 14, 2021 at 12:53 AM Thomas Monjalon  wrote:
>
> 03/09/2021 17:35, Ferruh Yigit:
> > On 9/3/2021 12:50 PM, Thomas Monjalon wrote:
> > > 02/09/2021 18:33, Ferruh Yigit:
> > >> On 8/26/2021 11:11 AM, Thomas Monjalon wrote:
> > >>> +* Any issue found in -rc1 should be fixed.
> > >>> +
> > >>> +rc3
> > >>> +~~~
> > >>> +
> > >>> +* Priority: applications. No application feature should be accepted 
> > >>> after -rc3.
> > >>> +* New functionality that does not depend on libraries update
> > >>> +  can be integrated as part of -rc3.
> > >>> +* The application change must include documentation in the relevant 
> > >>> .rst files
> > >>> +  (application-specific and release notes if significant).
> > >>> +* Libraries and drivers cleanup are allowed.
> > >>> +* Small driver reworks.
> > >>> +* Critical and minor bug fixes.
> > >>
> > >> As mentioned before, my concern is this may create false impression that 
> > >> bugs
> > >> are fixed only in this phase. What about remove this line completely and 
> > >> update
> > >> below -rc4 one as 'Critical bug fixes only.'? I think that makes 
> > >> intention more
> > >> clear.
> > >
> > > I had added in -rc2: "Any issue found in -rc1 should be fixed."
> > > Do you want to remove it as well?
> >
> > I think we can keep it, good to highlight one of the major tasks for -rc2 
> > is to
> > fix defects found in -rc1, and it doesn't limit fixes to ones found in -rc1.
>
> Actually I think it is better to remove.
> It looks weird to have it only in -rc2.
I see you have sent the new version and it is looking really good.

We can mention that bug fixes are welcome at any point in the cycle,
but priority will be given to critical fixes close to the release date.

>
>


Re: [dpdk-dev] [PATCH v9] doc: add release milestones definition

2021-09-14 Thread Ajit Khaparde
On Tue, Sep 14, 2021 at 9:50 AM Thomas Monjalon  wrote:
>
> 14/09/2021 18:34, Ferruh Yigit:
> > On 9/14/2021 8:56 AM, Thomas Monjalon wrote:
> > > From: Asaf Penso 
> > >
> > > Adding more information about the release milestones.
> > > This includes the scope of change, expectations, etc.
> > >
> > > Signed-off-by: Asaf Penso 
> > > Signed-off-by: Thomas Monjalon 
> > > Acked-by: John McNamara 
> > > Acked-by: Ajit Khaparde 
> > > Acked-by: Bruce Richardson 
> > > Acked-by: Andrew Rybchenko 
> >
> > Acked-by: Ferruh Yigit 
> >
> > Thanks for the documentation.
>
> Adding this last minute note at the end:
> Bug fixes are integrated as early as possible at any stage.
> Is it OK?
Looks good to me.

>
>


Re: [dpdk-dev] [PATCH v2] Warns if IPv4, UDP or TCP checksum offload not available

2021-09-14 Thread Ananyev, Konstantin



> 
> From: usamanadeem321 
> 
> Checks if IPV4, UDP and TCP Checksum offloads are available.
> If not available, prints a warning message.
> 
> Bugzilla ID: 545
> Signed-off-by: usamanadeem321 
> ---
>  examples/l3fwd/main.c | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
> index 00ac267af1..ae62bc570d 100644
> --- a/examples/l3fwd/main.c
> +++ b/examples/l3fwd/main.c
> @@ -123,7 +123,6 @@ static struct rte_eth_conf port_conf = {
>   .mq_mode = ETH_MQ_RX_RSS,
>   .max_rx_pkt_len = RTE_ETHER_MAX_LEN,
>   .split_hdr_size = 0,
> - .offloads = DEV_RX_OFFLOAD_CHECKSUM,
>   },
>   .rx_adv_conf = {
>   .rss_conf = {
> @@ -1039,6 +1038,27 @@ l3fwd_poll_resource_setup(void)
>   local_port_conf.txmode.offloads |=
>   DEV_TX_OFFLOAD_MBUF_FAST_FREE;
> 
> + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_IPV4_CKSUM)
> + local_port_conf.rxmode.offloads |=
> +  DEV_RX_OFFLOAD_IPV4_CKSUM;
> + else {
> + printf("WARNING: IPV4 Checksum offload not 
> available.\n");
> + }
> +
> + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_UDP_CKSUM)
> + local_port_conf.rxmode.offloads |=
> + DEV_RX_OFFLOAD_UDP_CKSUM;
> +
> + else
> + printf("WARNING: UDP Checksum offload not 
> available.\n");
> +
> + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_CKSUM)
> + local_port_conf.rxmode.offloads |=
> + DEV_RX_OFFLOAD_TCP_CKSUM;
> +
> + else
> + printf("WARNING: TCP Checksum offload not 
> available.\n");
> +

Sorry, but I didn't get the logic:
Application expects some offloads to be supported by HW.
You add the code that checks for offloads, but if they are not supported just 
prints warning
and continues, as if everything is ok. Doesn't look like correct behaviour to 
me.
I think, it should either terminate with error message or be prepared to work 
properly
on HW without these offloads (check cksums in SW if necessary).
In fact I don't see what was wrong with original behaviour, one thing that 
probably
was missing - more descriptive error message. 

>   local_port_conf.rx_adv_conf.rss_conf.rss_hf &=
>   dev_info.flow_type_rss_offloads;
> 
> --
> 2.25.1



Re: [dpdk-dev] [PATCH v2] net/iavf: enable interrupt polling

2021-09-14 Thread Kadam, Pallavi



On 8/25/2021 1:34 AM, Robin Zhang wrote:

For VF hosted by Intel 700 series NICs, internal rx interrupt and adminq
interrupt share the same source, that cause a lot cpu cycles be wasted on
interrupt handler on rx path.

The patch disable pci interrupt and remove the interrupt handler, replace
it with a low frequency(50ms) interrupt polling daemon which is
implemtented by registering an alarm callback periodly.

The virtual channel capability bit VIRTCHNL_VF_OFFLOAD_WB_ON_ITR can be
used to negotiate if iavf PMD needs to enable background alarm or not, so
ideally this change will not impact the case hosted by Intel 800 series
NICS.

This patch implements the same logic with an early i40e commit:
commit 864a800d706d ("net/i40e: remove VF interrupt handler")

Signed-off-by: Robin Zhang 

v2:
- only enable interrupt polling for VF of i40e devices.

---

Acked-by: Pallavi Kadam 


Re: [dpdk-dev] [PATCH v2] Warns if IPv4, UDP or TCP checksum offload not available

2021-09-14 Thread Stephen Hemminger
On Tue, 14 Sep 2021 22:22:04 +
"Ananyev, Konstantin"  wrote:

> > 
> > From: usamanadeem321 
> > 
> > Checks if IPV4, UDP and TCP Checksum offloads are available.
> > If not available, prints a warning message.
> > 
> > Bugzilla ID: 545
> > Signed-off-by: usamanadeem321 
> > ---
> >  examples/l3fwd/main.c | 22 +-
> >  1 file changed, 21 insertions(+), 1 deletion(-)
> > 
> > diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
> > index 00ac267af1..ae62bc570d 100644
> > --- a/examples/l3fwd/main.c
> > +++ b/examples/l3fwd/main.c
> > @@ -123,7 +123,6 @@ static struct rte_eth_conf port_conf = {
> > .mq_mode = ETH_MQ_RX_RSS,
> > .max_rx_pkt_len = RTE_ETHER_MAX_LEN,
> > .split_hdr_size = 0,
> > -   .offloads = DEV_RX_OFFLOAD_CHECKSUM,
> > },
> > .rx_adv_conf = {
> > .rss_conf = {
> > @@ -1039,6 +1038,27 @@ l3fwd_poll_resource_setup(void)
> > local_port_conf.txmode.offloads |=
> > DEV_TX_OFFLOAD_MBUF_FAST_FREE;
> > 
> > +   if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_IPV4_CKSUM)
> > +   local_port_conf.rxmode.offloads |=
> > +DEV_RX_OFFLOAD_IPV4_CKSUM;
> > +   else {
> > +   printf("WARNING: IPV4 Checksum offload not 
> > available.\n");
> > +   }
> > +
> > +   if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_UDP_CKSUM)
> > +   local_port_conf.rxmode.offloads |=
> > +   DEV_RX_OFFLOAD_UDP_CKSUM;
> > +
> > +   else
> > +   printf("WARNING: UDP Checksum offload not 
> > available.\n");
> > +
> > +   if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_CKSUM)
> > +   local_port_conf.rxmode.offloads |=
> > +   DEV_RX_OFFLOAD_TCP_CKSUM;
> > +
> > +   else
> > +   printf("WARNING: TCP Checksum offload not 
> > available.\n");
> > +  
> 
> Sorry, but I didn't get the logic:
> Application expects some offloads to be supported by HW.

The application is expecting more offloads than is necessary for basic
IP level forwarding which is all the example is documented to do.

  "The application performs L3 forwarding."

> You add the code that checks for offloads, but if they are not supported just 
> prints warning
> and continues, as if everything is ok. Doesn't look like correct behaviour to 
> me.
> I think, it should either terminate with error message or be prepared to work 
> properly
> on HW without these offloads (check cksums in SW if necessary).
> In fact I don't see what was wrong with original behaviour, one thing that 
> probably
> was missing - more descriptive error message. 

It is not a problem with your patch, it is fine.

It is a problem in how l3fwd has grown and changed and no longer really what
was intended in the original version. There is no reason that the application
should be looking at L4 data. In fact, it shouldn't care if it gets TCP, UDP, 
SCP or DCCP;
but the application now depends on ptype.

It should be possible to do L3 forwarding independent of packet type.
The application only needs to look at Ether type and do IPv4 or IPv6 based on 
that.






[dpdk-dev] [PATCH] windows/netuio: add Intel Virtual Function device IDs

2021-09-14 Thread Pallavi Kadam
Add Intel Ethernet Virtual Function device IDs to netuio inf file
to support Intel 40GbE and 100GbE deives on Windows VM.

Signed-off-by: Pallavi Kadam 
Reviewed-by: Ranjit Menon 
---
 windows/netuio/netuio.inf | 12 
 1 file changed, 12 insertions(+)

diff --git a/windows/netuio/netuio.inf b/windows/netuio/netuio.inf
index d166868..816ff17 100644
--- a/windows/netuio/netuio.inf
+++ b/windows/netuio/netuio.inf
@@ -49,6 +49,12 @@ HKR,,Icon,,-5
 %Intel.F1599.Description%=netuio_Device, PCI\VEN_8086&DEV_1599
 %Intel.F159A.Description%=netuio_Device, PCI\VEN_8086&DEV_159A
 %Intel.F159B.Description%=netuio_Device, PCI\VEN_8086&DEV_159B
+%Intel.F154C.Description%=netuio_Device, PCI\VEN_8086&DEV_154C
+%Intel.F1571.Description%=netuio_Device, PCI\VEN_8086&DEV_1571
+%Intel.F1889.Description%=netuio_Device, PCI\VEN_8086&DEV_1889
+%Intel.F374D.Description%=netuio_Device, PCI\VEN_8086&DEV_374D
+%Intel.F37CD.Description%=netuio_Device, PCI\VEN_8086&DEV_37CD
+%Intel.F3759.Description%=netuio_Device, PCI\VEN_8086&DEV_3759
 %vmxnet3.Description%=netuio_Device, PCI\VEN_15AD&DEV_07B0
 
 [netuio_Device.NT]
@@ -114,6 +120,12 @@ Intel.F1593.Description = "Intel(R) Ethernet Controller 
E810-C for SFP"
 Intel.F1599.Description = "Intel(R) Ethernet Controller E810-XXV for backplane"
 Intel.F159A.Description = "Intel(R) Ethernet Controller E810-XXV for QSFP"
 Intel.F159B.Description = "Intel(R) Ethernet Controller E810-XXV for SFP"
+Intel.F154C.Description = "Intel(R) Ethernet Virtual Function 700 Series"
+Intel.F1571.Description = "Intel(R) Ethernet Virtual Function 700 Series"
+Intel.F1889.Description = "Intel(R) Ethernet Adaptive Virtual Function"
+Intel.F374D.Description = "Intel(R) X722 Virtual Function"
+Intel.F37CD.Description = "Intel(R) Ethernet Virtual Function 700 Series"
+Intel.F3759.Description = "Intel(R) X722 Virtual Function"
 vmxnet3.Description = "VMWare Paravirtualized Ethernet v3"
 netuio.SVCDESC = "netuio Service"
 
-- 
2.31.1.windows.1



Re: [dpdk-dev] [PATCH] net/iavf: fix mbuf leak

2021-09-14 Thread Zhang, Qi Z



> -Original Message-
> From: dev  On Behalf Of Qiming Chen
> Sent: Saturday, September 11, 2021 9:47 AM
> To: dev@dpdk.org
> Cc: Xing, Beilei ; Wu, Jingjing 
> ;
> Qiming Chen ; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH] net/iavf: fix mbuf leak
> 
> A local test found that repeated port start and stop operations during the
> continuous SSE vector bufflist receiving process will cause the mbuf resource
> to run out. The final positioning is when the port is stopped, the mbuf of the
> pkt_first_seg pointer is not released. Resources leak.
> The patch scheme is to judge whether the pointer is empty when the port is
> stopped, and release the corresponding mbuf if it is not empty.
> 
> Fixes: 69dd4c3d0898 ("net/avf: enable queue and device")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Qiming Chen 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi



Re: [dpdk-dev] [PATCH] net/i40e: fix vf resource leakage problem

2021-09-14 Thread Zhang, Qi Z



> -Original Message-
> From: dev  On Behalf Of
> chenqiming_hua...@163.com
> Sent: Saturday, August 21, 2021 4:14 PM
> To: dev@dpdk.org
> Cc: Xing, Beilei ; Qiming Chen
> ; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH] net/i40e: fix vf resource leakage problem
> 
> From: Qiming Chen 
> 
> In the i40evf_dev_init function, when the MAC memory alloc fails, the
> previously initialized vf resource is not released, resulting in leakage.
> The patch calls the i40evf_uninit_vf function in the abnormal branch to 
> release
> resources.
> 
> Fixes: 5c9222058df7 ("i40e: move to drivers/net/")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Qiming Chen 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi



Re: [dpdk-dev] [PATCH v7] ethdev: add IPv4 and L4 checksum RSS offload types

2021-09-14 Thread Zhang, AlvinX
> -Original Message-
> From: Yigit, Ferruh 
> Sent: Tuesday, September 14, 2021 10:01 PM
> To: Zhang, AlvinX ; Zhang, Qi Z
> ; Guo, Junfeng 
> Cc: dev@dpdk.org; Ajit Khaparde ; Singh,
> Aman Deep 
> Subject: Re: [dpdk-dev] [PATCH v7] ethdev: add IPv4 and L4 checksum RSS
> offload types
> 
> On 8/31/2021 10:52 AM, Alvin Zhang wrote:
> > This patch defines new RSS offload types for IPv4 and
> > L4(TCP/UDP/SCTP) checksum, which are required when users want to
> > distribute packets based on the IPv4 or L4 checksum field.
> >
> > For example "flow create 0 ingress pattern eth / ipv4 / end actions
> > rss types ipv4-chksum end queues end / end", this flow causes all
> > matching packets to be distributed to queues on basis of IPv4
> > checksum.
> >
> > Signed-off-by: Alvin Zhang 
> > Acked-by: Ajit Khaparde 
> > Acked-by: Aman Deep Singh 
> > ---
> >
> > v6: rebase to eeedef70, update some note
> > v7: fix code style issues
> > ---
> >  app/test-pmd/cmdline.c |  4 +++-
> >  app/test-pmd/config.c  |  2 ++
> >  doc/guides/rel_notes/release_21_11.rst |  5 +
> >  lib/ethdev/rte_ethdev.h| 24
> 
> >  4 files changed, 34 insertions(+), 1 deletion(-)
> >
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > 82253bc..656a311 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -2252,6 +2252,8 @@ struct cmd_config_rss {
> > rss_conf.rss_hf = ETH_RSS_ECPRI;
> > else if (!strcmp(res->value, "mpls"))
> > rss_conf.rss_hf = ETH_RSS_MPLS;
> > +   else if (!strcmp(res->value, "ipv4-chksum"))
> > +   rss_conf.rss_hf = ETH_RSS_IPV4_CHKSUM;
> > else if (!strcmp(res->value, "none"))
> > rss_conf.rss_hf = 0;
> > else if (!strcmp(res->value, "level-default")) { @@ -2323,7 +2325,7
> > @@ struct cmd_config_rss {
> > .help_str = "port config all rss "
> > "all|default|eth|vlan|ip|tcp|udp|sctp|ether|port|vxlan|geneve|"
> >
>   "nvgre|vxlan-gpe|l2tpv3|esp|ah|pfcp|ecpri|mpls|none|level-default|"
> > -   "level-outer|level-inner|",
> > +   "level-outer|level-inner|ipv4-chksum|",
> > .tokens = {
> > (void *)&cmd_config_rss_port,
> > (void *)&cmd_config_rss_keyword,
> > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
> > 31d8ba1..ece78f2 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -140,6 +140,8 @@
> > { "gtpu", ETH_RSS_GTPU },
> > { "ecpri", ETH_RSS_ECPRI },
> > { "mpls", ETH_RSS_MPLS },
> > +   { "ipv4-chksum", ETH_RSS_IPV4_CHKSUM },
> > +   { "l4-chksum", ETH_RSS_L4_CHKSUM },>{ NULL, 0 },
> >  };
> >
> > diff --git a/doc/guides/rel_notes/release_21_11.rst
> > b/doc/guides/rel_notes/release_21_11.rst
> > index d707a55..fa29b13 100644
> > --- a/doc/guides/rel_notes/release_21_11.rst
> > +++ b/doc/guides/rel_notes/release_21_11.rst
> > @@ -55,6 +55,11 @@ New Features
> >   Also, make sure to start the actual text at the margin.
> >   ===
> >
> > +* **Add new RSS offload types for IPv4/L4 checksum in RSS flow.**
> > +
> > +  Add macros ETH_RSS_IPV4_CHKSUM and ETH_RSS_L4_CHKSUM, now IPv4
> and
> > + TCP/UDP/SCTP header checksum field can be used as input set for RSS.
> > +
> >
> >  Removed Items
> >  -
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > d2b27c3..e6734df 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -537,6 +537,30 @@ struct rte_eth_rss_conf {
> >  #define ETH_RSS_PPPOE (1ULL << 31)
> >  #define ETH_RSS_ECPRI (1ULL << 32)
> >  #define ETH_RSS_MPLS  (1ULL << 33)
> > +#define ETH_RSS_IPV4_CHKSUM   (1ULL << 34)
> > +
> > +/**
> > + * The ETH_RSS_L4_CHKSUM generally refers to a type of checksum field
> > +for
> 
> what does 'generally' means here? Is there a case it refers to something else?
> 
> > + * any L4 header, such as TCP, UDP and SCTP. It is similar to
> > + ETH_RSS_PORT,
> > + * it does not specify the type of L4 header.
> > + * We use this macro to replace below macro for constricting the use
> > + of RSS
> > + * offload bits:
> > + * ETH_RSS_IPV4_TCP_CHKSUM
> > + * ETH_RSS_IPV4_UDP_CHKSUM
> > + * ETH_RSS_IPV4_SCTP_CHKSUM
> > + * ETH_RSS_IPV6_TCP_CHKSUM
> > + * ETH_RSS_IPV6_UDP_CHKSUM
> > + * ETH_RSS_IPV6_SCTP_CHKSUM
> 
> As I get you are listing them here to say the 'ETH_RSS_L4_CHKSUM' replaces
> possible usage of above list, but my concern is it may confuse people as those
> macros exists (or did exist in the past), so what do you think to remove them?
> 
> 
> And just to confirm, we can't use this flag, 'ETH_RSS_L4_CHKSUM' anymore with
> 'rte_eth_rss_conf.rss_hf', right? Since it will be missing some context for 
> it.
> Which means some old APIs (and configuration) won't support this new offload,
> but only rte_flow will s

  1   2   >