Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

2023-03-07 Thread fengchengwen



On 2023/3/6 19:13, Konstantin Ananyev wrote:
> 
> 
> In the proactive error handling mode, the PMD will set the data path
> pointers to dummy functions and then try recovery, in this period the
> application may still invoking data path API. This will introduce a
> race-condition with data path which may lead to crash [1].
>
> Although the PMD added delay after setting data path pointers to cover
> the above race-condition, it reduces the probability, but it doesn't
> solve the problem.
>
> To solve the race-condition problem fundamentally, the following
> requirements are added:
> 1. The PMD should set the data path pointers to dummy functions after
> report RTE_ETH_EVENT_ERR_RECOVERING event.
> 2. The application should stop data path API invocation when process
> the RTE_ETH_EVENT_ERR_RECOVERING event.
> 3. The PMD should set the data path pointers to valid functions before
> report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> 4. The application should enable data path API invocation when process
> the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>
>>>
>>> How this is solving the race-condition, by pushing responsibility to
>>> stop data path to application?
>>
>> Exactly, it becomes application responsibility to make sure data-path is
>> stopped/suspended before recovery will continue.
>>
>
> From documentation of the feature:
>
> ``
> Because the PMD recovers automatically,
> the application can only sense that the data flow is disconnected for a
> while and the control API returns an error in this period.
>
> In order to sense the error happening/recovering, as well as to restore
> some additional configuration, three events are available:
> ``
>
> It looks like initial design is to use events mainly inform application
> about what happened and mainly for re-configuration.
>
> Although I am don't disagree to involve the application, I am not sure
> that is part of current design.

 I thought we all agreed that initial design contain some fallacies that
 need to fixed, no?
 Statement that with current rte_ethdev design error recovery can be done
 without interaction with the app (to stop/suspend data/control path)
 is the main one I think.
 It needs some interaction with app layer, one way or another.

>>>
>>> What if application is not interested in recovery modes at all and not
>>> registered any callback for the recovery?
>>
>>
>> Are you saying there is no way for application to disable
>> automatic recovery in PMD if it is not interested
>> (or can't full-fill per-requesties for it)?
>> If so, then yes it is a problem and we need to fix it.
>> I assumed that such mechanism to disable unwanted events already exists,
>> but I can't find anything.
>> Wonder what would be the easiest way here - can PMD make a decision
>> based on callback return value, or do we need a new API to
>> enable/disable callbacks, or ...?
>>
>>
>
> As far as I can see automatic recovery is not configurable by app.
>
> But that is not all, PMD sends events to application but PMD can't know
> if application is handling them or not, so with current design PMD can't
> rely on to app.

 Well, PMD invokes user provided callback.
 One way to fix that problem - if there is no callback provided,
 or callback returns an error code - PMD can assume that recovery
 should not be done.
 That is probably not the best design choice, but at least it will allow
 to fix the problem without too many changes and introducing new API.
 That could be sort of a 'quick fix'.
 In a meanwhile we can think about new/better approach for that.

>>>
>>> -rc2 for 23.03 is a few days away.
>>>
>>> What do you think to have 'quick fix' as modifying how driver updates
>>> burst ops to prevent the race condition, for this release?

The 'quick fix', do you mean only update function pointer (without rxq setting) 
?
Currently the PMDs which announced support "proactive error handling mode" 
already
do this.

>>>
>>> And plan a design update for the next release?
>> +1 on the overall approach.
> 
> Yep, agree.

Hope for better solution.
And also, I notice only the openvswitch (from all open-source software which 
based-on DPDK)
registers RTE_ETH_EVENT_INTR_RESET callback .

Therefore, hope we build a recovery framework at the DPDK SDK level and be 
compatible
with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism.

>  
>>
>>>
>>>
>
>>> I think driver should not rely on application for this, unless
>>> application explicitly says (to driver) that it is handling recovery,
>>> right now there is no way for driver to kn

RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs

2023-03-07 Thread Yan, Zhirun


> -Original Message-
> From: Jerin Jacob 
> Sent: Thursday, March 2, 2023 9:58 PM
> To: Yan, Zhirun 
> Cc: dev@dpdk.org; jer...@marvell.com; kirankum...@marvell.com;
> ndabilpu...@marvell.com; Liang, Cunming ; Wang,
> Haiyue 
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Thu, Mar 2, 2023 at 2:09 PM Yan, Zhirun  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Jerin Jacob 
> > > Sent: Monday, February 27, 2023 6:23 AM
> > > To: Yan, Zhirun 
> > > Cc: dev@dpdk.org; jer...@marvell.com; kirankum...@marvell.com;
> > > ndabilpu...@marvell.com; Liang, Cunming ;
> > > Wang, Haiyue 
> > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > APIs
> > >
> > > On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun  wrote:
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Jerin Jacob 
> > > > > Sent: Monday, February 20, 2023 9:51 PM
> > > > > To: Yan, Zhirun 
> > > > > Cc: dev@dpdk.org; jer...@marvell.com; kirankum...@marvell.com;
> > > > > ndabilpu...@marvell.com; Liang, Cunming
> > > > > ; Wang, Haiyue 
> > > > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker
> > > > > model APIs
> > > > >
> > > > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan
> > > > > 
> > > wrote:
> > > > > >
> > > > > > Add new get/set APIs to configure graph worker model which is
> > > > > > used to determine which model will be chosen.
> > > > > >
> > > > > > Signed-off-by: Haiyue Wang 
> > > > > > Signed-off-by: Cunming Liang 
> > > > > > Signed-off-by: Zhirun Yan 
> > > > > > ---
> > > > > >  lib/graph/rte_graph_worker.h| 51
> > > +
> > > > > >  lib/graph/rte_graph_worker_common.h | 13 
> > > > > >  lib/graph/version.map   |  3 ++
> > > > > >  3 files changed, 67 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> > > 100644
> > > > > > --- a/lib/graph/rte_graph_worker.h
> > > > > > +++ b/lib/graph/rte_graph_worker.h
> > > > > > @@ -1,5 +1,56 @@
> > > > > >  #include "rte_graph_model_rtc.h"
> > > > > >
> > > > > > +static enum rte_graph_worker_model worker_model =
> > > > > > +RTE_GRAPH_MODEL_DEFAULT;
> > > > >
> > > > > This will break the multiprocess.
> > > >
> > > > Thanks. I will use TLS for per-thread local storage.
> > >
> > > If it needs to be used from secondary process, then it needs to be
> > > from memzone.
> > >
> >
> >
> > This filed will be set by primary process in initial stage, and then lcore 
> > will only
> read it.
> > I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It
> > seems not necessary to allocate from memzone.
> >
> > >
> > >
> > > >
> > > > >
> > > > > > +
> > > > > > +/** Graph worker models */
> > > > > > +enum rte_graph_worker_model { #define WORKER_MODEL_DEFAULT
> > > > > > +"default"
> > > > >
> > > > > Why need strings?
> > > > > Also, every symbol in a public header file should start with
> > > > > RTE_ to avoid namespace conflict.
> > > >
> > > > It was used to config the model in app. I can put the string into 
> > > > example.
> > >
> > > OK
> > >
> > > >
> > > > >
> > > > > > +   RTE_GRAPH_MODEL_DEFAULT = 0, #define
> WORKER_MODEL_RTC
> > > > > > +"rtc"
> > > > > > +   RTE_GRAPH_MODEL_RTC,
> > > > >
> > > > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> > > enum
> > > > > itself.
> > > > Yes, will do in next version.
> > > >
> > > > >
> > > > > > +#define WORKER_MODEL_GENERIC "generic"
> > > > >
> > > > > Generic is a very overloaded term. Use pipeline here i.e
> > > > > RTE_GRAPH_MODEL_PIPELINE
> > > >
> > > > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> > >
> > > Hybrid is very overloaded term, and it will be confusing
> > > (considering there will be new models in future).
> > > Please pick a word that really express the model working.
> > >
> >
> > In this case, the path is Node0 -> Node1 -> Node2 -> Node3 And Node1
> > and Node3 are binding with one core.
> >
> > Our model offers the ability to dispatch between cores.
> >
> > Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?
> 
> Some names, What I can think of
> 
> // MCORE->MULTI CORE
> 
> RTE_GRAPH_MODEL_MCORE_PIPELINE
> or
> RTE_GRAG_MODEL_MCORE_DISPATCH
> or
> RTE_GRAG_MODEL_MCORE_RING
> or
> RTE_GRAPH_MODEL_MULTI_CORE
> 

Thanks, I will use RTE_GRAG_MODEL_MCORE_DISPATCH as the name.

> >
> > + - - - - - -+ +- - - - - - - - - - - - - + + - - - - - -+
> > '  Core #0   ' '  Core #1   Core #1   ' '  Core #2   '
> > '' '  ' ''
> > ' ++ ' ' ++++ ' ' ++ '
> > ' | Node-0 | - - - ->| Node-1 || Node-3 |<- - - - | Node-2 | '
> > ' ++ ' ' ++++ ' ' ++ '
> > '' ' |' '  ^ '
> > + - - - - - -+ +

Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

2023-03-07 Thread fengchengwen



On 2023/3/7 13:34, Honnappa Nagarahalli wrote:
> 
> 
>> -Original Message-
>> From: Konstantin Ananyev 
>> Sent: Sunday, March 5, 2023 9:24 AM
>> To: Honnappa Nagarahalli ;
>> dev@dpdk.org; Chengwen Feng ;
>> tho...@monjalon.net; Ferruh Yigit ; Andrew
>> Rybchenko ; Kalesh AP > anakkur.pura...@broadcom.com>; Ajit Khaparde
>> (ajit.khapa...@broadcom.com) 
>> Cc: nd 
>> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error 
>> handling
>> mode
>>
>>
>>
>> In the proactive error handling mode, the PMD will set the data
>> path pointers to dummy functions and then try recovery, in this
>> period the application may still invoking data path API. This will
>> introduce a race-condition with data path which may lead to crash [1].
>>
>> Although the PMD added delay after setting data path pointers to
>> cover the above race-condition, it reduces the probability, but it
>> doesn't solve the problem.
>>
>> To solve the race-condition problem fundamentally, the following
>> requirements are added:
>> 1. The PMD should set the data path pointers to dummy functions after
>>  report RTE_ETH_EVENT_ERR_RECOVERING event.
> Do you mean to say, PMD should set the data path pointers after
> calling the
 call back function?
> The PMD is running in the context of multiple EAL threads. How do
> these
 threads synchronize such that only one thread sets these data pointers?

 As I understand this event callback supposed to be called in the
 context of EAL interrupt thread (whoever is more familiar with
 original idea, feel free to correct me if I missed something).
>>> I could not figure this out. It looks to be called from the data plane 
>>> thread
>> context.
>>> I also have a thought on alternate design at the end, appreciate if you can
>> take a look.
>>>
 How it is going to signal data-path threads that they need to
 stop/suspend calling data-path API - that's I suppose is left to 
 application
>> to decide...
 Same as right now it is application responsibility to stop data-path
 threads before doing dev_stop()/dev/_config()/etc.
>>> Ok, good, this expectation is not new. The application must have a
>> mechanism already.
>>>


>
>> 2. The application should stop data path API invocation when process
>>  the RTE_ETH_EVENT_ERR_RECOVERING event.
> Any thoughts on how an application can do this?
>>> We can ignore this question as there is already similar expectation set for
>> earlier functionalities.
>>>
>
>> 3. The PMD should set the data path pointers to valid functions before
>>  report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>> 4. The application should enable data path API invocation when process
>>  the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> Do you mean to say that the application should not call the datapath
> APIs
 while the PMD is running the recovery process?

 Yes, I believe that's the intention.
>>> Ok, this is good and makes sense.
>>>

>>
>> Also, this patch introduce a driver internal function
>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>
>> [1]
>>

>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2
>> -
>> ashok.k.kal...@intel.com/
>>
>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>> Cc: sta...@dpdk.org
>>
>> Signed-off-by: Chengwen Feng 
>> ---
>>doc/guides/prog_guide/poll_mode_drv.rst | 20 +++-
>>lib/ethdev/ethdev_driver.c  |  8 +++
>>lib/ethdev/ethdev_driver.h  | 10 
>>lib/ethdev/rte_ethdev.h | 32 +++--
>>lib/ethdev/version.map  |  1 +
>>5 files changed, 46 insertions(+), 25 deletions(-)
>>
>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>> b/doc/guides/prog_guide/poll_mode_drv.rst
>> index c145a9066c..e380ff135a 100644
>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>> @@ -638,14 +638,9 @@ different from the application invokes
>> recovery in PASSIVE mode,  the PMD automatically recovers from
>> error in PROACTIVE mode,  and only a small amount of work is
>> required for the
 application.
>>
>> -During error detection and automatic recovery, -the PMD sets the
>> data path pointers to dummy functions -(which will prevent the
>> crash), -and also make sure the control path operations fail with a
>> return
 code ``-EBUSY``.
>> -
>> -Because the PMD recovers automatically, -the application can only
>> sense that the data flow is disconnected for a while -and the
>> control API returns an error in this period.
>> +During error detection and automatic recovery, the PMD sets the
>>

Re: [PATCH 4/4] doc: note that testpmd on mlx5 has dedicated queues problem

2023-03-07 Thread Havlík Martin

Hi Slava,

as I no longer work on the DPDK centered project which led me to 
encounter said issue, I am no longer in touch with changes made to 
net/bonding and moreover I don't have access to resources to test the 
current state of things.


But that aside, the linked commit seems to fix the issue (again, cannot 
test it myself but I'll trust the time passed since the commit without 
further changes).


If archiving is what is done to obsolete patches, then yes, go ahead, 
please.


Wish you all the best,
Martin

Dne 2023-03-06 16:21, Slava Ovsiienko napsal:

Hi, Martin

Is this issue resolved by:  
http://git.dpdk.org/dpdk/commit/?id=f66323717e  ?


Should we archive the patch?

With best regards,
Slava




-Original Message-
From: Martin Havlik 
Sent: среда, 21 июля 2021 г. 18:59
To: xhavl...@stud.fit.vutbr.cz; Xiaoyun Li ; 
Ferruh Yigit
; Andrew Rybchenko 
;

Ajit Khaparde ; Haiyue Wang
; Ori Kam ; Haifei Luo
; Slava Ovsiienko ; Andrey
Vesnovaty ; Bing Zhao ;
Jiawei(Jonny) Wang ; Gregory Etelson
; Li Zhang 
Cc: dev@dpdk.org; Jan Viktorin 
Subject: [PATCH 4/4] doc: note that testpmd on mlx5 has dedicated 
queues

problem

In bonding mode 4 (8023ad), dedicated queues are not working on mlx5 
NICs.


Signed-off-by: Martin Havlik 
---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 2c43719ad3..8a6edc2bad 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -2603,6 +2603,9 @@ when in mode 4 (link-aggregation-802.3ad)::

testpmd> set bonding lacp dedicated_queues (port_id) 
(enable|disable)


+.. note::
+   Dedicated queues `do not currently work
+   `__ on mlx5 NICs.

 set bonding agg_mode
 
--
2.27.0


RE: [PATCH v2] net/mlx5: enable hint in async table

2023-03-07 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: Rongwei Liu 
> Sent: Monday, March 6, 2023 1:37 PM
> To: dev@dpdk.org
> Cc: Matan Azrad ; Slava Ovsiienko
> 
> Subject: [PATCH v2] net/mlx5: enable hint in async table
> 
> Driver gets the hint value from rte_table_attr for async flow.
> Parse the value and pass the supported value to hardware accordingly.
> 
> Signed-off-by: Rongwei Liu 
> ---

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh


RE: [PATCH 0/5] net/mlx5: enhanced CQE compression layout

2023-03-07 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: Alexander Kozyrev 
> Sent: Tuesday, February 28, 2023 6:43 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh ; Slava Ovsiienko
> ; Matan Azrad 
> Subject: [PATCH 0/5] net/mlx5: enhanced CQE compression layout
> 
> Support Enhanced CQE Compression Layout in mlx5 driver for better latency
> and SW utilization.
> 
> Alexander Kozyrev (5):
>   common/mlx5: detect enhanced CQE compression capability
>   common/mlx5: add CQE validity iteration count
>   net/mlx5: support enhanced CQE compression in Rx burst
>   net/mlx5: support enhanced CQE zipping in vector Rx burst
>   net/mlx5: enable enhanced CQE compression
> 
>  doc/guides/nics/mlx5.rst |  14 +-
>  doc/guides/rel_notes/release_23_03.rst   |   1 +
>  drivers/common/mlx5/mlx5_common.h|  57 +++-
>  drivers/common/mlx5/mlx5_common_devx.c   |   4 +-
>  drivers/common/mlx5/mlx5_devx_cmds.c |   3 +
>  drivers/common/mlx5/mlx5_devx_cmds.h |   2 +
>  drivers/common/mlx5/mlx5_prm.h   |  42 --
>  drivers/net/mlx5/mlx5.c  |  17 ++-
>  drivers/net/mlx5/mlx5.h  |   1 +
>  drivers/net/mlx5/mlx5_devx.c |   2 +
>  drivers/net/mlx5/mlx5_rx.c   | 175 +++
>  drivers/net/mlx5/mlx5_rx.h   |  13 +-
>  drivers/net/mlx5/mlx5_rxq.c  |   5 +-
>  drivers/net/mlx5/mlx5_rxtx_vec.c |  24 +++-
>  drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 108 ++
>  drivers/net/mlx5/mlx5_rxtx_vec_neon.h|  91 
>  drivers/net/mlx5/mlx5_rxtx_vec_sse.h |  94 
>  17 files changed, 475 insertions(+), 178 deletions(-)
> 
> --
> 2.18.2

Series applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh


Re: [PATCH v2 0/2] Add option to timestamp console log

2023-03-07 Thread Bruce Richardson
On Mon, Mar 06, 2023 at 10:18:14AM -0800, Stephen Hemminger wrote:
> This is a reprise of earlier patch to add timestamp to console
> messages.
> 
> Example:
> # dpdk-testpmd -l 1-4 -n 4 --vdev net_null0 --log-timestamp -- -i
> EAL: Detected CPU lcores: 16
> EAL: Detected NUMA nodes: 1
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> Interactive-mode selected
> [   0.191407] testpmd: create a new mbuf pool : n=171456, 
> size=2176, socket=0
> [   0.191510] testpmd: preferred mempool ops selected: ring_mp_mc
> 
> 
> v2 
>- rebase to current DPDK
>- make linux log code common to freebsd
> 
> Stephen Hemminger (2):
>   eal: unify logging code for FreeBsd and Linux
>   eal: add option to put timestamp on console output
>
Series-acked-by: Bruce Richardson  


Re: [PATCH v3 0/2] Add option to timestamp console log

2023-03-07 Thread Bruce Richardson
On Tue, Mar 07, 2023 at 08:33:17AM +0100, Morten Brørup wrote:
> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > Sent: Monday, 6 March 2023 20.28
> > 
> > This is a reprise of earlier patch to add timestamp to console
> > messages.
> > 
> > Example:
> > # dpdk-testpmd -l 1-4 -n 4 --vdev net_null0 --log-timestamp -- -i
> > EAL: Detected CPU lcores: 16
> > EAL: Detected NUMA nodes: 1
> > EAL: Detected static linkage of DPDK
> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > Interactive-mode selected
> > [   0.191407] testpmd: create a new mbuf pool : n=171456,
> > size=2176, socket=0
> > [   0.191510] testpmd: preferred mempool ops selected: ring_mp_mc
> 
> This is useful.
> 
> Here's some feature creep: Since the timestamp output is configurable, there 
> is no need to settle on one specific timestamp format. The option could allow 
> a choice between MONOTONIC and REALTIME, with an option for REALTIME to 
> include the date in ISO 8601 format (-MM-DD). And REALTIME could be UTC 
> or local time. You could consider taking a format string for strftime(), with 
> the extension that %f expands to 6 digit microseconds like in Python [1].
> 
> [1]: 
> https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
> 
> With or without feature creep,
> 
> Series-acked-by: Morten Brørup 
>

Missed the fact there was already a v3 when I acked v2, so repeating here.

Series-acked-by: Bruce Richardson 

I'd rather not have the feature creep on it though - at least not yet!


Re: [PATCH v3 2/2] eal: add option to put timestamp on console output

2023-03-07 Thread fengchengwen
On 2023/3/7 3:28, Stephen Hemminger wrote:
> When debugging driver or startup issues, it is useful to have
> a timestamp on each message printed. The messages in syslog
> already have a timestamp, but often syslog is not available
> during testing. The timestamp format is chosen to look
> like the default Linux dmesg timestamp.
> 
> Example:
> [   0.40] EAL: Probing VFIO support...
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  .../freebsd_gsg/freebsd_eal_parameters.rst| 32 ++
>  doc/guides/linux_gsg/linux_eal_parameters.rst |  5 +++
>  lib/eal/common/eal_common_options.c   |  5 +++
>  lib/eal/common/eal_internal_cfg.h |  1 +
>  lib/eal/common/eal_options.h  |  2 +
>  lib/eal/unix/eal_log.c| 42 +--
>  6 files changed, 84 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/freebsd_gsg/freebsd_eal_parameters.rst 
> b/doc/guides/freebsd_gsg/freebsd_eal_parameters.rst
> index fba467a2ce92..99cff10e963c 100644
> --- a/doc/guides/freebsd_gsg/freebsd_eal_parameters.rst
> +++ b/doc/guides/freebsd_gsg/freebsd_eal_parameters.rst
> @@ -18,3 +18,35 @@ FreeBSD-specific EAL parameters
>  ---
>  
>  There are currently no FreeBSD-specific EAL command-line parameters 
> available.
> +
> +Other options
> +~
> +
> +*   ``--syslog ``
> +
> +Set syslog facility. Valid syslog facilities are::
> +
> +auth
> +cron
> +daemon
> +ftp
> +kern
> +lpr
> +mail
> +news
> +syslog
> +user
> +uucp
> +local0
> +local1
> +local2
> +local3
> +local4
> +local5
> +local6
> +local7

This should add to commit 1/2 [PATCH v3 1/2] eal: unify logging code for 
FreeBsd and Linux

> +
> +*   ``--log-timestamp``
> +
> +Add a timestamp of seconds and microseconds to each log message
> +written to standard output.
> diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst 
> b/doc/guides/linux_gsg/linux_eal_parameters.rst
> index ea8f38139119..719ca6851625 100644
> --- a/doc/guides/linux_gsg/linux_eal_parameters.rst
> +++ b/doc/guides/linux_gsg/linux_eal_parameters.rst
> @@ -135,3 +135,8 @@ Other options
>  local5
>  local6
>  local7
> +
> +*   ``--log-timestamp``
> +
> +Add a timestamp of seconds and microseconds to each log message
> +written to standard output.
> diff --git a/lib/eal/common/eal_common_options.c 
> b/lib/eal/common/eal_common_options.c
> index 03059336987d..2d3d8e82f7f3 100644
> --- a/lib/eal/common/eal_common_options.c
> +++ b/lib/eal/common/eal_common_options.c
> @@ -76,6 +76,7 @@ eal_long_options[] = {
>   {OPT_IOVA_MODE, 1, NULL, OPT_IOVA_MODE_NUM},
>   {OPT_LCORES,1, NULL, OPT_LCORES_NUM   },
>   {OPT_LOG_LEVEL, 1, NULL, OPT_LOG_LEVEL_NUM},
> + {OPT_LOG_TIMESTAMP, 0, NULL, OPT_LOG_TIMESTAMP_NUM},
>   {OPT_TRACE, 1, NULL, OPT_TRACE_NUM},
>   {OPT_TRACE_DIR, 1, NULL, OPT_TRACE_DIR_NUM},
>   {OPT_TRACE_BUF_SIZE,1, NULL, OPT_TRACE_BUF_SIZE_NUM   },
> @@ -1833,6 +1834,9 @@ eal_parse_common_option(int opt, const char *optarg,
>   }
>   break;
>   }
> + case OPT_LOG_TIMESTAMP_NUM:
> + conf->log_timestamp = 1;
> + break;
>  
>  #ifndef RTE_EXEC_ENV_WINDOWS
>   case OPT_TRACE_NUM: {
> @@ -2194,6 +2198,7 @@ eal_common_usage(void)
>  "  --"OPT_PROC_TYPE" Type of this process 
> (primary|secondary|auto)\n"
>  #ifndef RTE_EXEC_ENV_WINDOWS
>  "  --"OPT_SYSLOG"Set syslog facility\n"
> +"  --"OPT_LOG_TIMESTAMP" Timestamp log output\n"
>  #endif
>  "  --"OPT_LOG_LEVEL"= Set global log level\n"
>  "  --"OPT_LOG_LEVEL"=:\n"
> diff --git a/lib/eal/common/eal_internal_cfg.h 
> b/lib/eal/common/eal_internal_cfg.h
> index 167ec501fa79..33144c3619dd 100644
> --- a/lib/eal/common/eal_internal_cfg.h
> +++ b/lib/eal/common/eal_internal_cfg.h
> @@ -85,6 +85,7 @@ struct internal_config {
>* per-node) non-legacy mode only.
>*/
>   volatile int syslog_facility; /**< facility passed to openlog() */
> + volatile uint8_t log_timestamp;   /**< add timestamp to console output 
> */
>   /** default interrupt mode for VFIO */
>   volatile enum rte_intr_mode vfio_intr_mode;
>   /** the shared VF token for VFIO-PCI bound PF and VFs devices */
> diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h
> index 3cc9cb641284..cc9723868e3c 100644
> --- a/lib/eal/common/eal_options.h
> +++ b/lib/eal/common/eal_options.h
> @@ -35,6 +35,8 @@ enum {
>   OPT_LCORES_NUM,
>  #define OPT_LOG_LEVEL "log-level"
>   OPT_LOG_LEVEL_NUM,
> +#define OPT_LOG_TIMESTAMP "log-timestamp"
> + OPT_LOG_TIMESTA

Re: [PATCH 4/4] doc: note that testpmd on mlx5 has dedicated queues problem

2023-03-07 Thread Jan Viktorin
Hello all,

Mario (CC) will look at this.

H.

On Tue, 07 Mar 2023 10:00:35 +0100
Havlík Martin  wrote:

> Hi Slava,
> 
> as I no longer work on the DPDK centered project which led me to 
> encounter said issue, I am no longer in touch with changes made to 
> net/bonding and moreover I don't have access to resources to test the 
> current state of things.
> 
> But that aside, the linked commit seems to fix the issue (again,
> cannot test it myself but I'll trust the time passed since the commit
> without further changes).
> 
> If archiving is what is done to obsolete patches, then yes, go ahead, 
> please.
> 
> Wish you all the best,
> Martin
> 
> Dne 2023-03-06 16:21, Slava Ovsiienko napsal:
> > Hi, Martin
> > 
> > Is this issue resolved by:  
> > http://git.dpdk.org/dpdk/commit/?id=f66323717e  ?
> > 
> > Should we archive the patch?
> > 
> > With best regards,
> > Slava
> > 
> > 
> >   
> >> -Original Message-
> >> From: Martin Havlik 
> >> Sent: среда, 21 июля 2021 г. 18:59
> >> To: xhavl...@stud.fit.vutbr.cz; Xiaoyun Li ; 
> >> Ferruh Yigit
> >> ; Andrew Rybchenko 
> >> ;
> >> Ajit Khaparde ; Haiyue Wang
> >> ; Ori Kam ; Haifei Luo
> >> ; Slava Ovsiienko ;
> >> Andrey Vesnovaty ; Bing Zhao
> >> ; Jiawei(Jonny) Wang ;
> >> Gregory Etelson ; Li Zhang 
> >> Cc: dev@dpdk.org; Jan Viktorin 
> >> Subject: [PATCH 4/4] doc: note that testpmd on mlx5 has dedicated 
> >> queues
> >> problem
> >> 
> >> In bonding mode 4 (8023ad), dedicated queues are not working on
> >> mlx5 NICs.
> >> 
> >> Signed-off-by: Martin Havlik 
> >> ---
> >>  doc/guides/testpmd_app_ug/testpmd_funcs.rst | 3 +++
> >>  1 file changed, 3 insertions(+)
> >> 
> >> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >> index 2c43719ad3..8a6edc2bad 100644
> >> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> >> @@ -2603,6 +2603,9 @@ when in mode 4 (link-aggregation-802.3ad)::
> >>   
> >> testpmd> set bonding lacp dedicated_queues (port_id)   
> >> (enable|disable)
> >> 
> >> +.. note::
> >> +   Dedicated queues `do not currently work
> >> +   `__ on mlx5 NICs.
> >> 
> >>  set bonding agg_mode
> >>  
> >> --
> >> 2.27.0  



RE: [PATCH v2 2/3] examples/l3fwd: removed hash entry number

2023-03-07 Thread Konstantin Ananyev



> Fixes: e7e6dd643092 ("examples/l3fwd: support config file for EM")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Kamalakshitha Aligeri 
> Reviewed-by: Honnappa Nagarahalli 
> ---
>  examples/l3fwd/l3fwd.h |  1 -
>  examples/l3fwd/main.c  | 37 +
>  2 files changed, 1 insertion(+), 37 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
> index ca1426a687..b55855c932 100644
> --- a/examples/l3fwd/l3fwd.h
> +++ b/examples/l3fwd/l3fwd.h
> @@ -55,7 +55,6 @@
>  /* 32-bit has less address-space for hugepage memory, limit to 1M entries */
>  #define L3FWD_HASH_ENTRIES   (1024*1024*1)
>  #endif
> -#define HASH_ENTRY_NUMBER_DEFAULT16
> 
>  struct parm_cfg {
>   const char *rule_ipv4_name;
> diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
> index 71a3018036..a4f061537e 100644
> --- a/examples/l3fwd/main.c
> +++ b/examples/l3fwd/main.c
> @@ -89,7 +89,6 @@ uint32_t enabled_port_mask;
> 
>  /* Used only in exact match mode. */
>  int ipv6; /**< ipv6 is false by default. */
> -uint32_t hash_entry_number = HASH_ENTRY_NUMBER_DEFAULT;
> 
>  struct lcore_conf lcore_conf[RTE_MAX_LCORE];
> 
> @@ -395,7 +394,6 @@ print_usage(const char *prgname)
>   " [--eth-dest=X,MM:MM:MM:MM:MM:MM]"
>   " [--max-pkt-len PKTLEN]"
>   " [--no-numa]"
> - " [--hash-entry-num]"
>   " [--ipv6]"
>   " [--parse-ptype]"
>   " [--per-port-pool]"
> @@ -419,7 +417,6 @@ print_usage(const char *prgname)
>   "  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for 
> port X\n"
>   "  --max-pkt-len PKTLEN: maximum packet length in decimal 
> (64-9600)\n"
>   "  --no-numa: Disable numa awareness\n"
> - "  --hash-entry-num: Specify the hash entry number in 
> hexadecimal to be setup\n"
>   "  --ipv6: Set if running ipv6 packets\n"
>   "  --parse-ptype: Set to use software to analyze packet type\n"
>   "  --per-port-pool: Use separate buffer pool per port\n"
> @@ -479,22 +476,6 @@ parse_portmask(const char *portmask)
>   return pm;
>  }
> 
> -static int
> -parse_hash_entry_number(const char *hash_entry_num)
> -{
> - char *end = NULL;
> - unsigned long hash_en;
> - /* parse hexadecimal string */
> - hash_en = strtoul(hash_entry_num, &end, 16);
> - if ((hash_entry_num[0] == '\0') || (end == NULL) || (*end != '\0'))
> - return -1;
> -
> - if (hash_en == 0)
> - return -1;
> -
> - return hash_en;
> -}
> -
>  static int
>  parse_config(const char *q_arg)
>  {
> @@ -852,14 +833,7 @@ parse_args(int argc, char **argv)
>   break;
> 
>   case CMD_LINE_OPT_HASH_ENTRY_NUM_NUM:
> - ret = parse_hash_entry_number(optarg);
> - if ((ret > 0) && (ret <= L3FWD_HASH_ENTRIES)) {
> - hash_entry_number = ret;
> - } else {
> - fprintf(stderr, "invalid hash entry number\n");
> - print_usage(prgname);
> - return -1;
> - }
> + fprintf(stderr, "Hash entry number will be ignored\n");
>   break;
> 
>   case CMD_LINE_OPT_PARSE_PTYPE_NUM:
> @@ -963,15 +937,6 @@ parse_args(int argc, char **argv)
>   lookup_mode = L3FWD_LOOKUP_LPM;
>   }
> 
> - /*
> -  * hash flag is valid only for
> -  * exact match, reset it to default for
> -  * longest-prefix match.
> -  */
> - if (lookup_mode == L3FWD_LOOKUP_LPM) {
> - hash_entry_number = HASH_ENTRY_NUMBER_DEFAULT;
> - }
> -
>   /* For ACL, update port config rss hash filter */
>   if (lookup_mode == L3FWD_LOOKUP_ACL) {
>   port_conf.rx_adv_conf.rss_conf.rss_hf |=
> --
> 2.25.1

Acked-by: Konstantin Ananyev 
 


> 



RE: [PATCH v2 3/3] doc/l3fwd: lpm supports IPv4 and IPv6 forwarding

2023-03-07 Thread Konstantin Ananyev



> -Original Message-
> From: Kamalakshitha Aligeri 
> Sent: Monday, March 6, 2023 4:25 PM
> To: jer...@marvell.com; tho...@monjalon.net; david.march...@redhat.com; 
> sean.morris...@intel.com; Konstantin Ananyev
> ; ruifeng.w...@arm.com; 
> honnappa.nagaraha...@arm.com
> Cc: dev@dpdk.org; n...@arm.com
> Subject: [PATCH v2 3/3] doc/l3fwd: lpm supports IPv4 and IPv6 forwarding
> 
> LPM based lookup supports both IPv4 and IPv6 forwarding.
> 
> Fixes: 6a094e328598 ("examples/l3fwd: implement FIB lookup method")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Kamalakshitha Aligeri 
> Reviewed-by: Honnappa Nagarahalli 
> Reviewed-by: Ruifeng Wang 
> ---
>  doc/guides/sample_app_ug/l3_forward.rst | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/doc/guides/sample_app_ug/l3_forward.rst 
> b/doc/guides/sample_app_ug/l3_forward.rst
> index 94b22da01e..1cc2c1dd1d 100644
> --- a/doc/guides/sample_app_ug/l3_forward.rst
> +++ b/doc/guides/sample_app_ug/l3_forward.rst
> @@ -56,9 +56,8 @@ for the IPv4/IPv6 5-tuple syntax specifically.
>  The 5-tuple syntax consists of a source IP address, a destination IP address,
>  a source port, a destination port and a protocol identifier.
> 
> -In the sample application, hash-based, FIB-based and ACL-based forwarding 
> supports
> +In the sample application, hash-based, LPM-based, FIB-based and ACL-based 
> forwarding supports
>  both IPv4 and IPv6.
> -LPM-based forwarding supports IPv4 only.
>  During the initialization phase route rules for IPv4 and IPv6 are read from 
> rule files.
> 
>  Compiling the Application
> --

Acked-by: Konstantin Ananyev 
 

> 2.25.1
> 



RE: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

2023-03-07 Thread Konstantin Ananyev


> >
> > In the proactive error handling mode, the PMD will set the data path
> > pointers to dummy functions and then try recovery, in this period 
> > the
> > application may still invoking data path API. This will introduce a
> > race-condition with data path which may lead to crash [1].
> >
> > Although the PMD added delay after setting data path pointers to 
> > cover
> > the above race-condition, it reduces the probability, but it doesn't
> > solve the problem.
> >
> > To solve the race-condition problem fundamentally, the following
> > requirements are added:
> > 1. The PMD should set the data path pointers to dummy functions 
> > after
> > report RTE_ETH_EVENT_ERR_RECOVERING event.
> > 2. The application should stop data path API invocation when process
> > the RTE_ETH_EVENT_ERR_RECOVERING event.
> > 3. The PMD should set the data path pointers to valid functions 
> > before
> > report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > 4. The application should enable data path API invocation when 
> > process
> > the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >
> >>>
> >>> How this is solving the race-condition, by pushing responsibility to
> >>> stop data path to application?
> >>
> >> Exactly, it becomes application responsibility to make sure data-path 
> >> is
> >> stopped/suspended before recovery will continue.
> >>
> >
> > From documentation of the feature:
> >
> > ``
> > Because the PMD recovers automatically,
> > the application can only sense that the data flow is disconnected for a
> > while and the control API returns an error in this period.
> >
> > In order to sense the error happening/recovering, as well as to restore
> > some additional configuration, three events are available:
> > ``
> >
> > It looks like initial design is to use events mainly inform application
> > about what happened and mainly for re-configuration.
> >
> > Although I am don't disagree to involve the application, I am not sure
> > that is part of current design.
> 
>  I thought we all agreed that initial design contain some fallacies that
>  need to fixed, no?
>  Statement that with current rte_ethdev design error recovery can be done
>  without interaction with the app (to stop/suspend data/control path)
>  is the main one I think.
>  It needs some interaction with app layer, one way or another.
> 
> >>>
> >>> What if application is not interested in recovery modes at all and not
> >>> registered any callback for the recovery?
> >>
> >>
> >> Are you saying there is no way for application to disable
> >> automatic recovery in PMD if it is not interested
> >> (or can't full-fill per-requesties for it)?
> >> If so, then yes it is a problem and we need to fix it.
> >> I assumed that such mechanism to disable unwanted events already 
> >> exists,
> >> but I can't find anything.
> >> Wonder what would be the easiest way here - can PMD make a decision
> >> based on callback return value, or do we need a new API to
> >> enable/disable callbacks, or ...?
> >>
> >>
> >
> > As far as I can see automatic recovery is not configurable by app.
> >
> > But that is not all, PMD sends events to application but PMD can't know
> > if application is handling them or not, so with current design PMD can't
> > rely on to app.
> 
>  Well, PMD invokes user provided callback.
>  One way to fix that problem - if there is no callback provided,
>  or callback returns an error code - PMD can assume that recovery
>  should not be done.
>  That is probably not the best design choice, but at least it will allow
>  to fix the problem without too many changes and introducing new API.
>  That could be sort of a 'quick fix'.
>  In a meanwhile we can think about new/better approach for that.
> 
> >>>
> >>> -rc2 for 23.03 is a few days away.
> >>>
> >>> What do you think to have 'quick fix' as modifying how driver updates
> >>> burst ops to prevent the race condition, for this release?
> 
> The 'quick fix', do you mean only update function pointer (without rxq 
> setting) ?
> Currently the PMDs which announced support "proactive error handling mode" 
> already
> do this.

Really sorry guys, I was too fast on the keyboard, and didn't read properly 
what Ferruh suggested.
Reading it once again - no I don not agree with that.
It wouldn't fix anything, but will just add extra mess into the code.
Sorry again for the wrong reply.
Konstantin


> 
> >>>
> >>> And plan a design update for the next release?
> >> +1 on the overall approach.
> >
> > Yep, agree.
> 
> Hope for better solution.
> And als

RE: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

2023-03-07 Thread Konstantin Ananyev


> >
> >
> > 
> >  In the proactive error handling mode, the PMD will set the data
> >  path pointers to dummy functions and then try recovery, in this
> >  period the application may still invoking data path API. This will
> >  introduce a race-condition with data path which may lead to crash [1].
> > 
> >  Although the PMD added delay after setting data path pointers to
> >  cover the above race-condition, it reduces the probability, but it
> >  doesn't solve the problem.
> > 
> >  To solve the race-condition problem fundamentally, the following
> >  requirements are added:
> >  1. The PMD should set the data path pointers to dummy functions after
> >   report RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>> Do you mean to say, PMD should set the data path pointers after
> > >>> calling the
> > >> call back function?
> > >>> The PMD is running in the context of multiple EAL threads. How do
> > >>> these
> > >> threads synchronize such that only one thread sets these data pointers?
> > >>
> > >> As I understand this event callback supposed to be called in the
> > >> context of EAL interrupt thread (whoever is more familiar with
> > >> original idea, feel free to correct me if I missed something).
> > > I could not figure this out. It looks to be called from the data plane 
> > > thread
> > context.
> > > I also have a thought on alternate design at the end, appreciate if you 
> > > can
> > take a look.
> > >
> > >> How it is going to signal data-path threads that they need to
> > >> stop/suspend calling data-path API - that's I suppose is left to 
> > >> application
> > to decide...
> > >> Same as right now it is application responsibility to stop data-path
> > >> threads before doing dev_stop()/dev/_config()/etc.
> > > Ok, good, this expectation is not new. The application must have a
> > mechanism already.
> > >
> > >>
> > >>
> > >>>
> >  2. The application should stop data path API invocation when process
> >   the RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>> Any thoughts on how an application can do this?
> > > We can ignore this question as there is already similar expectation set 
> > > for
> > earlier functionalities.
> > >
> > >>>
> >  3. The PMD should set the data path pointers to valid functions before
> >   report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >  4. The application should enable data path API invocation when process
> >   the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>> Do you mean to say that the application should not call the datapath
> > >>> APIs
> > >> while the PMD is running the recovery process?
> > >>
> > >> Yes, I believe that's the intention.
> > > Ok, this is good and makes sense.
> > >
> > >>
> > 
> >  Also, this patch introduce a driver internal function
> >  rte_eth_fp_ops_setup which used as an help function for PMD.
> > 
> >  [1]
> > 
> > >>
> > http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2
> >  -
> >  ashok.k.kal...@intel.com/
> > 
> >  Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> >  Cc: sta...@dpdk.org
> > 
> >  Signed-off-by: Chengwen Feng 
> >  ---
> > doc/guides/prog_guide/poll_mode_drv.rst | 20 +++-
> > lib/ethdev/ethdev_driver.c  |  8 +++
> > lib/ethdev/ethdev_driver.h  | 10 
> > lib/ethdev/rte_ethdev.h | 32 
> >  +++--
> > lib/ethdev/version.map  |  1 +
> > 5 files changed, 46 insertions(+), 25 deletions(-)
> > 
> >  diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> >  b/doc/guides/prog_guide/poll_mode_drv.rst
> >  index c145a9066c..e380ff135a 100644
> >  --- a/doc/guides/prog_guide/poll_mode_drv.rst
> >  +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> >  @@ -638,14 +638,9 @@ different from the application invokes
> >  recovery in PASSIVE mode,  the PMD automatically recovers from
> >  error in PROACTIVE mode,  and only a small amount of work is
> >  required for the
> > >> application.
> > 
> >  -During error detection and automatic recovery, -the PMD sets the
> >  data path pointers to dummy functions -(which will prevent the
> >  crash), -and also make sure the control path operations fail with a
> >  return
> > >> code ``-EBUSY``.
> >  -
> >  -Because the PMD recovers automatically, -the application can only
> >  sense that the data flow is disconnected for a while -and the
> >  control API returns an error in this period.
> >  +During error detection and automatic recovery, the PMD sets the
> >  +data path pointers to dummy functions and also make sure the
> >  +control path operations failed with a return code ``-EBUSY``.
> > 
> > In order to sense the error happening/recovering,  as well as to
> >  restor

RE: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

2023-03-07 Thread Konstantin Ananyev


> > > In the proactive error handling mode, the PMD will set the data 
> > > path
> > > pointers to dummy functions and then try recovery, in this period 
> > > the
> > > application may still invoking data path API. This will introduce 
> > > a
> > > race-condition with data path which may lead to crash [1].
> > >
> > > Although the PMD added delay after setting data path pointers to 
> > > cover
> > > the above race-condition, it reduces the probability, but it 
> > > doesn't
> > > solve the problem.
> > >
> > > To solve the race-condition problem fundamentally, the following
> > > requirements are added:
> > > 1. The PMD should set the data path pointers to dummy functions 
> > > after
> > > report RTE_ETH_EVENT_ERR_RECOVERING event.
> > > 2. The application should stop data path API invocation when 
> > > process
> > > the RTE_ETH_EVENT_ERR_RECOVERING event.
> > > 3. The PMD should set the data path pointers to valid functions 
> > > before
> > > report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > > 4. The application should enable data path API invocation when 
> > > process
> > > the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >
> > >>>
> > >>> How this is solving the race-condition, by pushing responsibility to
> > >>> stop data path to application?
> > >>
> > >> Exactly, it becomes application responsibility to make sure 
> > >> data-path is
> > >> stopped/suspended before recovery will continue.
> > >>
> > >
> > > From documentation of the feature:
> > >
> > > ``
> > > Because the PMD recovers automatically,
> > > the application can only sense that the data flow is disconnected for 
> > > a
> > > while and the control API returns an error in this period.
> > >
> > > In order to sense the error happening/recovering, as well as to 
> > > restore
> > > some additional configuration, three events are available:
> > > ``
> > >
> > > It looks like initial design is to use events mainly inform 
> > > application
> > > about what happened and mainly for re-configuration.
> > >
> > > Although I am don't disagree to involve the application, I am not sure
> > > that is part of current design.
> > 
> >  I thought we all agreed that initial design contain some fallacies that
> >  need to fixed, no?
> >  Statement that with current rte_ethdev design error recovery can be 
> >  done
> >  without interaction with the app (to stop/suspend data/control path)
> >  is the main one I think.
> >  It needs some interaction with app layer, one way or another.
> > 
> > >>>
> > >>> What if application is not interested in recovery modes at all and 
> > >>> not
> > >>> registered any callback for the recovery?
> > >>
> > >>
> > >> Are you saying there is no way for application to disable
> > >> automatic recovery in PMD if it is not interested
> > >> (or can't full-fill per-requesties for it)?
> > >> If so, then yes it is a problem and we need to fix it.
> > >> I assumed that such mechanism to disable unwanted events already 
> > >> exists,
> > >> but I can't find anything.
> > >> Wonder what would be the easiest way here - can PMD make a decision
> > >> based on callback return value, or do we need a new API to
> > >> enable/disable callbacks, or ...?
> > >>
> > >>
> > >
> > > As far as I can see automatic recovery is not configurable by app.
> > >
> > > But that is not all, PMD sends events to application but PMD can't 
> > > know
> > > if application is handling them or not, so with current design PMD 
> > > can't
> > > rely on to app.
> > 
> >  Well, PMD invokes user provided callback.
> >  One way to fix that problem - if there is no callback provided,
> >  or callback returns an error code - PMD can assume that recovery
> >  should not be done.
> >  That is probably not the best design choice, but at least it will allow
> >  to fix the problem without too many changes and introducing new API.
> >  That could be sort of a 'quick fix'.
> >  In a meanwhile we can think about new/better approach for that.
> > 
> > >>>
> > >>> -rc2 for 23.03 is a few days away.
> > >>>
> > >>> What do you think to have 'quick fix' as modifying how driver updates
> > >>> burst ops to prevent the race condition, for this release?
> >
> > The 'quick fix', do you mean only update function pointer (without rxq 
> > setting) ?
> > Currently the PMDs which announced support "proactive error handling mode" 
> > already
> > do this.
> 
> Really sorry guys, I was too fast on the keyboard, and didn't read properly 
> what 

Re: [PATCH 4/4] doc: note that testpmd on mlx5 has dedicated queues problem

2023-03-07 Thread Mário Kuka

Hello Slava,

I confirm that the patch http://git.dpdk.org/dpdk/commit/?id=f66323717e 
has resolved this issue.

I have tested that after applying the patch, the problem no longer occurs.

With best regards,
Mário

On 07/03/2023 10:37, Jan Viktorin wrote:

Hello all,

Mario (CC) will look at this.

H.

On Tue, 07 Mar 2023 10:00:35 +0100
Havlík Martin  wrote:


Hi Slava,

as I no longer work on the DPDK centered project which led me to
encounter said issue, I am no longer in touch with changes made to
net/bonding and moreover I don't have access to resources to test the
current state of things.

But that aside, the linked commit seems to fix the issue (again,
cannot test it myself but I'll trust the time passed since the commit
without further changes).

If archiving is what is done to obsolete patches, then yes, go ahead,
please.

Wish you all the best,
Martin

Dne 2023-03-06 16:21, Slava Ovsiienko napsal:

Hi, Martin

Is this issue resolved by:
http://git.dpdk.org/dpdk/commit/?id=f66323717e  ?

Should we archive the patch?

With best regards,
Slava


   

-Original Message-
From: Martin Havlik 
Sent: среда, 21 июля 2021 г. 18:59
To: xhavl...@stud.fit.vutbr.cz; Xiaoyun Li ;
Ferruh Yigit
; Andrew Rybchenko
;
Ajit Khaparde ; Haiyue Wang
; Ori Kam ; Haifei Luo
; Slava Ovsiienko ;
Andrey Vesnovaty ; Bing Zhao
; Jiawei(Jonny) Wang ;
Gregory Etelson ; Li Zhang 
Cc: dev@dpdk.org; Jan Viktorin 
Subject: [PATCH 4/4] doc: note that testpmd on mlx5 has dedicated
queues
problem

In bonding mode 4 (8023ad), dedicated queues are not working on
mlx5 NICs.

Signed-off-by: Martin Havlik 
---
  doc/guides/testpmd_app_ug/testpmd_funcs.rst | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 2c43719ad3..8a6edc2bad 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -2603,6 +2603,9 @@ when in mode 4 (link-aggregation-802.3ad)::
   
 testpmd> set bonding lacp dedicated_queues (port_id)

(enable|disable)

+.. note::
+   Dedicated queues `do not currently work
+   `__ on mlx5 NICs.

  set bonding agg_mode
  
--
2.27.0




smime.p7s
Description: S/MIME Cryptographic Signature


RE: [PATCH 4/4] doc: note that testpmd on mlx5 has dedicated queues problem

2023-03-07 Thread Slava Ovsiienko
Hi,

Thank you, Mario, for confirming.
So, patch is not actual anymore.

With best regards,
Slava

> -Original Message-
> From: Mário Kuka 
> Sent: вторник, 7 марта 2023 г. 12:26
> To: Jan Viktorin ; Slava Ovsiienko
> 
> Cc: Havlík Martin ; Xiaoyun Li
> ; Ferruh Yigit ; Andrew
> Rybchenko ; Ajit Khaparde
> ; Haiyue Wang ; Ori
> Kam ; Haifei Luo ; Andrey Vesnovaty
> ; Bing Zhao ; Jiawei(Jonny) Wang
> ; Gregory Etelson ; Li Zhang
> ; dev@dpdk.org
> Subject: Re: [PATCH 4/4] doc: note that testpmd on mlx5 has dedicated queues
> problem
> 
> Hello Slava,
> 
> I confirm that the patch http://git.dpdk.org/dpdk/commit/?id=f66323717e
> has resolved this issue.
> I have tested that after applying the patch, the problem no longer occurs.
> 
> With best regards,
> Mário
> 
> On 07/03/2023 10:37, Jan Viktorin wrote:
> > Hello all,
> >
> > Mario (CC) will look at this.
> >
> > H.
> >
> > On Tue, 07 Mar 2023 10:00:35 +0100
> > Havlík Martin  wrote:
> >
> >> Hi Slava,
> >>
> >> as I no longer work on the DPDK centered project which led me to
> >> encounter said issue, I am no longer in touch with changes made to
> >> net/bonding and moreover I don't have access to resources to test the
> >> current state of things.
> >>
> >> But that aside, the linked commit seems to fix the issue (again,
> >> cannot test it myself but I'll trust the time passed since the commit
> >> without further changes).
> >>
> >> If archiving is what is done to obsolete patches, then yes, go ahead,
> >> please.
> >>
> >> Wish you all the best,
> >> Martin
> >>
> >> Dne 2023-03-06 16:21, Slava Ovsiienko napsal:
> >>> Hi, Martin
> >>>
> >>> Is this issue resolved by:
> >>> http://git.dpdk.org/dpdk/commit/?id=f66323717e  ?
> >>>
> >>> Should we archive the patch?
> >>>
> >>> With best regards,
> >>> Slava
> >>>
> >>>
> >>>
>  -Original Message-
>  From: Martin Havlik 
>  Sent: среда, 21 июля 2021 г. 18:59
>  To: xhavl...@stud.fit.vutbr.cz; Xiaoyun Li ;
>  Ferruh Yigit
>  ; Andrew Rybchenko
>  ;
>  Ajit Khaparde ; Haiyue Wang
>  ; Ori Kam ; Haifei Luo
>  ; Slava Ovsiienko ;
>  Andrey Vesnovaty ; Bing Zhao
>  ; Jiawei(Jonny) Wang ;
>  Gregory Etelson ; Li Zhang 
>  Cc: dev@dpdk.org; Jan Viktorin 
>  Subject: [PATCH 4/4] doc: note that testpmd on mlx5 has dedicated
>  queues
>  problem
> 
>  In bonding mode 4 (8023ad), dedicated queues are not working on
>  mlx5 NICs.
> 
>  Signed-off-by: Martin Havlik 
>  ---
>    doc/guides/testpmd_app_ug/testpmd_funcs.rst | 3 +++
>    1 file changed, 3 insertions(+)
> 
>  diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>  b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>  index 2c43719ad3..8a6edc2bad 100644
>  --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>  +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>  @@ -2603,6 +2603,9 @@ when in mode 4 (link-aggregation-802.3ad)::
> 
>   testpmd> set bonding lacp dedicated_queues (port_id)
>  (enable|disable)
> 
>  +.. note::
>  +   Dedicated queues `do not currently work
>  +   `__ on mlx5 NICs.
> 
>    set bonding agg_mode
>    
>  --
>  2.27.0



[PATCH] net/mlx5: fix the hairpin Tx queue reference count

2023-03-07 Thread Bing Zhao
When calling the haipin unbind interface, all the hairpin Tx queues
of the port will be unbound from the peer Rx queues. If one of the
Tx queue is working in the auto bind mode, the interface will return
directly.

Only when the Tx and peer Rx ports are the same, the auto bind mode
is supported. In this condition branch, the Tx queue release is
missed and the reference count is not decreased. Then in the port
stop stage, the hardware resources of this Tx queue won't be
freed. There would be some assertion or failure when starting the
port again.

With this commit, the reference count will be operated correctly.

Fixes: 37cd4501e873 ("net/mlx5: support two ports hairpin mode")
Cc: sta...@dpdk.org

Signed-off-by: Bing Zhao 
Acked-by: Matan Azrad 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_trigger.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 3457bf65d3..bbaa7d2aa0 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -896,11 +896,11 @@ mlx5_hairpin_unbind_single_port(struct rte_eth_dev *dev, 
uint16_t rx_port)
}
/* Indeed, only the first used queue needs to be checked. */
if (txq_ctrl->hairpin_conf.manual_bind == 0) {
+   mlx5_txq_release(dev, i);
if (cur_port != rx_port) {
rte_errno = EINVAL;
DRV_LOG(ERR, "port %u and port %u are in"
" auto-bind mode", cur_port, rx_port);
-   mlx5_txq_release(dev, i);
return -rte_errno;
} else {
return 0;
-- 
2.31.1



[PATCH v2 1/1] drivers: remove implementation of Rx metadata negotiation

2023-03-07 Thread Hanumanth Pothula
Presently, Rx metadata is sent to PMD by default, leading
to a performance drop as processing for the same in Rx path
takes extra cycles.

Hence, removing driver implementation of Rx metadata negotiation
and falling back to old implementation where mark actions are
tracked as part of the flow rule.

Signed-off-by: Hanumanth Pothula 
---
v2: Remove explicit initializations.
---
 drivers/common/cnxk/roc_npc.c  | 19 +++
 drivers/common/cnxk/roc_npc.h  |  3 +++
 drivers/common/cnxk/roc_npc_priv.h |  1 +
 drivers/common/cnxk/version.map|  2 ++
 drivers/net/cnxk/cn10k_ethdev.c| 26 --
 drivers/net/cnxk/cn10k_flow.c  | 19 +++
 drivers/net/cnxk/cn9k_ethdev.c | 25 -
 drivers/net/cnxk/cn9k_flow.c   | 20 
 drivers/net/cnxk/cnxk_ethdev.h |  1 -
 9 files changed, 64 insertions(+), 52 deletions(-)

diff --git a/drivers/common/cnxk/roc_npc.c b/drivers/common/cnxk/roc_npc.c
index a795114326..47536c8ce8 100644
--- a/drivers/common/cnxk/roc_npc.c
+++ b/drivers/common/cnxk/roc_npc.c
@@ -5,6 +5,23 @@
 #include "roc_api.h"
 #include "roc_priv.h"
 
+int
+roc_npc_mark_actions_get(struct roc_npc *roc_npc)
+{
+   struct npc *npc = roc_npc_to_npc_priv(roc_npc);
+
+   return npc->mark_actions;
+}
+
+int
+roc_npc_mark_actions_sub_return(struct roc_npc *roc_npc, uint32_t count)
+{
+   struct npc *npc = roc_npc_to_npc_priv(roc_npc);
+
+   npc->mark_actions -= count;
+   return npc->mark_actions;
+}
+
 int
 roc_npc_vtag_actions_get(struct roc_npc *roc_npc)
 {
@@ -488,12 +505,14 @@ npc_parse_actions(struct roc_npc *roc_npc, const struct 
roc_npc_attr *attr,
}
mark = act_mark->id + 1;
req_act |= ROC_NPC_ACTION_TYPE_MARK;
+   npc->mark_actions += 1;
flow->match_id = mark;
break;
 
case ROC_NPC_ACTION_TYPE_FLAG:
mark = NPC_FLOW_FLAG_VAL;
req_act |= ROC_NPC_ACTION_TYPE_FLAG;
+   npc->mark_actions += 1;
break;
 
case ROC_NPC_ACTION_TYPE_COUNT:
diff --git a/drivers/common/cnxk/roc_npc.h b/drivers/common/cnxk/roc_npc.h
index 5e07e26a91..61d0628f5f 100644
--- a/drivers/common/cnxk/roc_npc.h
+++ b/drivers/common/cnxk/roc_npc.h
@@ -397,6 +397,9 @@ int __roc_api roc_npc_mcam_free_all_resources(struct 
roc_npc *roc_npc);
 void __roc_api roc_npc_flow_dump(FILE *file, struct roc_npc *roc_npc);
 void __roc_api roc_npc_flow_mcam_dump(FILE *file, struct roc_npc *roc_npc,
  struct roc_npc_flow *mcam);
+int __roc_api roc_npc_mark_actions_get(struct roc_npc *roc_npc);
+int __roc_api roc_npc_mark_actions_sub_return(struct roc_npc *roc_npc,
+ uint32_t count);
 int __roc_api roc_npc_vtag_actions_get(struct roc_npc *roc_npc);
 int __roc_api roc_npc_vtag_actions_sub_return(struct roc_npc *roc_npc,
  uint32_t count);
diff --git a/drivers/common/cnxk/roc_npc_priv.h 
b/drivers/common/cnxk/roc_npc_priv.h
index 08d763eeb4..714dcb09c9 100644
--- a/drivers/common/cnxk/roc_npc_priv.h
+++ b/drivers/common/cnxk/roc_npc_priv.h
@@ -393,6 +393,7 @@ struct npc {
uint16_t flow_prealloc_size;/* Pre allocated mcam size */
uint16_t flow_max_priority; /* Max priority for flow */
uint16_t switch_header_type; /* Supported switch header type */
+   uint32_t mark_actions;
uint32_t vtag_strip_actions; /* vtag insert/strip actions */
uint16_t pf_func;/* pf_func of device */
npc_dxcfg_t prx_dxcfg;   /* intf, lid, lt, extract */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 5d2b75fb5a..3eff3870d1 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -344,6 +344,8 @@ INTERNAL {
roc_npc_flow_parse;
roc_npc_get_low_priority_mcam;
roc_npc_init;
+   roc_npc_mark_actions_get;
+   roc_npc_mark_actions_sub_return;
roc_npc_vtag_actions_get;
roc_npc_vtag_actions_sub_return;
roc_npc_mcam_alloc_entries;
diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
index b84fed6d90..512b9c2597 100644
--- a/drivers/net/cnxk/cn10k_ethdev.c
+++ b/drivers/net/cnxk/cn10k_ethdev.c
@@ -39,9 +39,6 @@ nix_rx_offload_flags(struct rte_eth_dev *eth_dev)
if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY)
flags |= NIX_RX_OFFLOAD_SECURITY_F;
 
-   if (dev->rx_mark_update)
-   flags |= NIX_RX_OFFLOAD_MARK_UPDATE_F;
-
return flags;
 }
 
@@ -562,27 +559,6 @@ cn10k_nix_dev_start(struct rte_eth_dev *eth_dev)
return 0;
 }
 
-static int
-cn10k_nix_rx_metadata_negotiate(struct rte_eth_dev *eth_dev, uint64_t 

RE: [PATCH v3 1/3] ethdev: enable direct rearm with separate API

2023-03-07 Thread Konstantin Ananyev


> > On 3/6/2023 1:26 PM, Morten Brørup wrote:
> > >> From: Ferruh Yigit [mailto:ferruh.yi...@amd.com]
> > >> Sent: Monday, 6 March 2023 13.49
> > >>
> > >> On 1/4/2023 8:21 AM, Morten Brørup wrote:
> >  From: Feifei Wang [mailto:feifei.wa...@arm.com]
> >  Sent: Wednesday, 4 January 2023 08.31
> > 
> >  Add 'tx_fill_sw_ring' and 'rx_flush_descriptor' API into direct
> >  rearm mode for separate Rx and Tx Operation. And this can support
> >  different multiple sources in direct rearm mode. For examples, Rx
> >  driver is ixgbe, and Tx driver is i40e.
> > 
> >  Suggested-by: Honnappa Nagarahalli 
> >  Suggested-by: Ruifeng Wang 
> >  Signed-off-by: Feifei Wang 
> >  Reviewed-by: Ruifeng Wang 
> >  Reviewed-by: Honnappa Nagarahalli 
> >  ---
> > >>>
> > >>> This feature looks very promising for performance. I am pleased to
> > >>> see
> > >> progress on it.
> > >>>
> > >>
> > >> Hi Morten,
> > >>
> > >> Yes it brings some performance, but not to generic use case, only to
> > >> specific and constraint use case.
> > >
> > > I got the impression that the supported use case is a prominent and 
> > > important
> > use case.
> > >
> >
> > Can you please give real life samples for this use case, other than just 
> > showing
> > better performance number in the test bench? This helps to understand the
> > reasoning better.
> The very first patch started off with a constrained but prominent use case. 
> Though, DPU based PCIe cards running DPDK applications
> with 1 or max 2 ports being used in tons of data centers is not a secret 
> anymore and not a small use case that can be ignored.
> However, the design of the patch has changed significantly from then. Now the 
> solution can be applied to any generic use case that
> uses run-to-completion model of DPDK. i.e. the mapping of the RX and TX ports 
> can be done dynamically in the data plane threads.
> There is no need of static configuration from control plane.

+1 to this statement.
I think the authors did a good job to make it generic enough, so it can be used 
for many different cases,
plus AFAIU it doesn't introduce new implicit restrictions to the user. 
Again, this feature is totally optional, so users are free to ignore it. 
Personally I do not see any good reason why we shouldn't accept this feature 
into DPDK.
Off-course with more code reviews, extra testing, docs updates, etc.. 

> 
> On the test bench, we need to make up our mind. When we see improvements, we 
> say it is just a test bench. On other occasions
> when the test bench does not show any improvements (but improvements are 
> shown by other metrics), we say the test bench does
> not show any improvements.
> 
> >
> > > This is the primary argument for considering such a complex non-generic
> > feature.
> I am not sure what is the complexity here, can you please elaborate?
> I see other patches/designs (ex: proactive error recovery) which are way more 
> complex to understand and comprehend.
> 
> > >
> > >>
> > >> And changes are relatively invasive comparing the usecase it
> > >> supports, like it adds new two inline datapath functions and a new 
> > >> dev_ops.
> > >>
> > >> I am worried the unnecessary complexity and possible regressions in
> > >> the fundamental and simple parts of the project, with a good
> > >> intention to gain a few percentage performance in a specific usecase,
> > >> can hurt the project.
> I agree that we are touching some fundamental parts of the project. But, we 
> also need to realize that those fundamental parts were
> not developed on architectures that have joined the project way later. 
> Similarly, the use cases have evolved significantly from the
> original intended use cases. We cannot hold on to those fundamental designs 
> if they affect the performance on other architectures
> while addressing prominent new use cases.
> Please note that this patch does not break any existing features or affect 
> their performance in any negative way. The generic and
> originally intended use cases can benefit from this feature.
> 
> > >>
> > >>
> > >> I can see this is compared to MBUF_FAST_FREE feature, but
> > >> MBUF_FAST_FREE is just an offload benefiting from existing offload
> > >> infrastructure, which requires very small update and logically change
> > >> in application and simple to implement in the drivers. So, they are
> > >> not same from complexity perspective.
> > >>
> > >> Briefly, I am not comfortable with this change, I would like to see
> > >> an explicit approval and code review from techboard to proceed.
> > >
> > > I agree that the complexity is very high, and thus requires extra 
> > > consideration.
> > Your suggested techboard review and approval process seems like a good
> > solution.
> We can add to the agenda for the next Techboard meeting.
> 
> > >
> > > And the performance benefit of direct rearm should be compared to the
> > performance using the new zero-copy mempool API.
> > >
> > > -M

RE: [PATCH v3 2/3] net/i40e: enable direct rearm with separate API

2023-03-07 Thread Konstantin Ananyev


> > -邮件原件-
> > 发件人: Konstantin Ananyev 
> > 发送时间: Tuesday, February 28, 2023 3:36 AM
> > 收件人: Feifei Wang ; Konstantin Ananyev
> > ; Yuying Zhang
> > ; Beilei Xing ; Ruifeng
> > Wang 
> > 抄送: dev@dpdk.org; nd ; Honnappa Nagarahalli
> > ; nd 
> > 主题: RE: [PATCH v3 2/3] net/i40e: enable direct rearm with separate API
> >
> >
> >
> > > > > +int
> > > > > +i40e_tx_fill_sw_ring(void *tx_queue,
> > > > > + struct rte_eth_rxq_rearm_data *rxq_rearm_data) {
> > > > > + struct i40e_tx_queue *txq = tx_queue;
> > > > > + struct i40e_tx_entry *txep;
> > > > > + void **rxep;
> > > > > + struct rte_mbuf *m;
> > > > > + int i, n;
> > > > > + int nb_rearm = 0;
> > > > > +
> > > > > + if (*rxq_rearm_data->rearm_nb < txq->tx_rs_thresh ||
> > > > > + txq->nb_tx_free > txq->tx_free_thresh)
> > > > > + return 0;
> > > > > +
> > > > > + /* check DD bits on threshold descriptor */
> > > > > + if ((txq->tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
> > > > > + rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
> > > > > +
> > > > rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
> > > > > + return 0;
> > > > > +
> > > > > + n = txq->tx_rs_thresh;
> > > > > +
> > > > > + /* first buffer to free from S/W ring is at index
> > > > > +  * tx_next_dd - (tx_rs_thresh-1)
> > > > > +  */
> > > > > + txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
> > > > > + rxep = rxq_rearm_data->rx_sw_ring;
> > > > > + rxep += *rxq_rearm_data->rearm_start;
> > > > > +
> > > > > + if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
> > > > > + /* directly put mbufs from Tx to Rx */
> > > > > + for (i = 0; i < n; i++, rxep++, txep++)
> > > > > + *rxep = txep[0].mbuf;
> > > > > + } else {
> > > > > + for (i = 0; i < n; i++, rxep++) {
> > > > > + m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
> >
> > One thing I forgot to ask:
> > What would happen if this mbuf belongs to different mempool (not one that
> > we specify at rx_queue_setup())?
> > Do we need to check it here?
> > Or would it be upper layer constraint?
> > Or...?
> >
> 
> First, 'different mempool' is valid for no FAST_FREE path in tx_free_buffers.
> 
> If buffers belong to different mempool, we can have an example here:
> Buffer 1 from mempool 1, its recycle path is:
> -
> 1. queue_setup: rearm from mempool 1 into Rx sw-ring
> 2. rte_eth_Rx_burst: used by user app (Rx)
> 3. rte_eth_Tx_burst: mount on Tx sw-ring
> 4. rte_eth_direct_rearm: free into Rx sw-ring:
>or
> tx_free_buffers: free into mempool 1 (no fast_free path)
> -
> 
> Buffer 2 from mempool 2, its recycle path is:
> -
> 1. queue_setup: rearm from mempool 2 into Rx sw-ring
> 2. rte_eth_Rx_burst: used by user app (Rx)
> 3. rte_eth_Tx_burst: mount on Tx sw-ring
> 4. rte_eth_direct_rearm: free into Rx sw-ring
>or
> tx_free_buffers: free into mempool 2 (no fast_free_path)
> -
> 
> Thus, buffers from Tx different mempools are the same for Rx. The difference 
> point
> is that they will be freed into different mempool if the thread  uses generic 
> free buffers.
> I think this cannot affect direct-rearm mode, and we do not need to check 
> this.

I understand that it should work even with multiple mempools.
What I am trying to say - user may not want to use mbufs from particular 
mempool for RX
(while it is still ok to use it for TX).
Let say user can have a separate mempool with small data-buffers (less then 
normal MTU)  
to send some 'special' paclets, or even use this memppol with small buffers for 
zero-copy
updating of packet L2/L3 headers, etc.
Or it could be some 'special' user provided mempool.
That's why I wonder should we allow only mbufs from mempool that is assigned to 
that RX queue. 

> 
> > > > > + if (m != NULL) {
> > > > > + *rxep = m;
> > > > > + nb_rearm++;
> > > > > + }
> > > > > + }
> > > > > + n = nb_rearm;
> > > > > + }
> > > > > +
> > > > > + /* update counters for Tx */
> > > > > + txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + 
> > > > > txq->tx_rs_thresh);
> > > > > + txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + 
> > > > > txq->tx_rs_thresh);
> > > > > + if (txq->tx_next_dd >= txq->nb_tx_desc)
> > > > > + txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
> > > > > +
> > > > > + return n;
> > > > > +}
> > > > > +


RE: [PATCH 0/4] net/mlx5: add template table insertion and matching types

2023-03-07 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: Alexander Kozyrev 
> Sent: Friday, January 27, 2023 1:41 AM
> To: dev@dpdk.org
> Cc: NBU-Contact-Thomas Monjalon (EXTERNAL) ;
> Ori Kam ; Raslan Darawsheh ;
> Matan Azrad ; Slava Ovsiienko
> 
> Subject: [PATCH 0/4] net/mlx5: add template table insertion and matching
> types
> 
> Add the insertion type and hash calculation function of a template table.
> Support the new rte_flow_async_create_by_index() Flow API to allow flow
> rules insertion into a specified index of index-based template tables.
> Set MLX5_LINEAR_HASH_TAG_INDEX as a special id for specifiyng the index.
> Allow to retrieve the hash calculation result via modify_field Flow API.
> 
> Flow API changes:
> https://patchwork.dpdk.org/project/dpdk/cover/20230121052158.2928238-
> 1-akozy...@nvidia.com/
> 
> Alexander Kozyrev (4):
>   net/mlx5: add table insertion type and hash function
>   net/mlx5: add flow rule insertion by index
>   net/mlx5: add hash result metadata to modify field
>   net/mlx5: define index register for linear tables
> 
>  drivers/net/mlx5/mlx5_flow.c|  61 +
>  drivers/net/mlx5/mlx5_flow.h|  15 +++
>  drivers/net/mlx5/mlx5_flow_dv.c |  18 +++-
> drivers/net/mlx5/mlx5_flow_hw.c | 157
> +++-
>  drivers/net/mlx5/rte_pmd_mlx5.h |   5 +
>  5 files changed, 249 insertions(+), 7 deletions(-)
> 
> --
> 2.18.2

Series applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh


RE: [PATCH v2] doc: update recipe for static rdma-core in mlx guides

2023-03-07 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: Thomas Monjalon 
> Sent: Wednesday, January 11, 2023 10:45 AM
> To: dev@dpdk.org
> Cc: Ali Alnubani ; bruce.richard...@intel.com; Matan
> Azrad ; Slava Ovsiienko 
> Subject: [PATCH v2] doc: update recipe for static rdma-core in mlx guides
> 
> With recent versions of rdma-core, it becomes important to install the library
> after its compilation.
> If including rdma-core library from its build directory, some non-standard
> compiler tricks are used.
> When using an install directory for rdma-core, DPDK compilation is fine.
> 
> While at it, disabling unneeded pyVerbs and man pages.
> 
> Signed-off-by: Thomas Monjalon 
> ---
> v2: reword commit log
> ---
>  doc/guides/nics/mlx4.rst | 3 ++-
>  doc/guides/platform/mlx5.rst | 3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh


Re: [PATCH v3] app/testpmd: fix secondary process not forwarding

2023-03-07 Thread Ferruh Yigit
On 3/7/2023 3:25 AM, He, ShiyangX wrote:
> 
> 
>> -Original Message-
>> From: Ferruh Yigit 
>> Sent: Monday, March 6, 2023 11:06 PM
>> To: He, ShiyangX ; dev@dpdk.org
>> Cc: Zhou, YidingX ; sta...@dpdk.org; Zhang, Yuying
>> ; Singh, Aman Deep
>> ; Burakov, Anatoly
>> ; Matan Azrad ; Dmitry
>> Kozlyuk 
>> Subject: Re: [PATCH v3] app/testpmd: fix secondary process not forwarding
>>
>> On 2/23/2023 2:41 PM, Shiyang He wrote:
>>> Under multi-process scenario, the secondary process gets queue state
>>> from the wrong location (the global variable 'ports'). Therefore, the
>>> secondary process can not forward since "stream_init" is not called.
>>>
>>> This commit fixes the issue by calling 'rte_eth_rx/tx_queue_info_get'
>>> to get queue state from shared memory.
>>>
>>> Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
>>> Cc: sta...@dpdk.org
>>>
>>> Signed-off-by: Shiyang He 
>>> Acked-by: Yuying Zhang 
>>>
>>> v3: Add return value description
>>> ---
>>>  app/test-pmd/testpmd.c | 45
>>> --
>>>  1 file changed, 43 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>> 0c14325b8d..a050472aea 100644
>>> --- a/app/test-pmd/testpmd.c
>>> +++ b/app/test-pmd/testpmd.c
>>> @@ -2418,9 +2418,50 @@ start_packet_forwarding(int with_tx_first)
>>> if (!pkt_fwd_shared_rxq_check())
>>> return;
>>>
>>> -   if (stream_init != NULL)
>>> -   for (i = 0; i < cur_fwd_config.nb_fwd_streams; i++)
>>> +   if (stream_init != NULL) {
>>> +   for (i = 0; i < cur_fwd_config.nb_fwd_streams; i++) {
>>> +   if (rte_eal_process_type() == RTE_PROC_SECONDARY)
>> {
>>> +   struct fwd_stream *fs = fwd_streams[i];
>>> +   struct rte_eth_rxq_info rx_qinfo;
>>> +   struct rte_eth_txq_info tx_qinfo;
>>> +   int32_t rc;
>>> +   rc = rte_eth_rx_queue_info_get(fs->rx_port,
>>> +   fs->rx_queue, &rx_qinfo);
>>> +   if (rc == 0) {
>>> +   ports[fs->rx_port].rxq[fs-
>>> rx_queue].state =
>>> +   rx_qinfo.queue_state;
>>> +   } else if (rc == -ENOTSUP) {
>>> +   /* Set the rxq state to
>> RTE_ETH_QUEUE_STATE_STARTED
>>> +* to ensure that the PMDs do not
>> implement
>>> +* rte_eth_rx_queue_info_get can
>> forward.
>>> +*/
>>> +   ports[fs->rx_port].rxq[fs-
>>> rx_queue].state =
>>> +
>>  RTE_ETH_QUEUE_STATE_STARTED;
>>> +   } else {
>>> +   TESTPMD_LOG(WARNING,
>>> +   "Failed to get rx queue
>> info\n");
>>> +   }
>>> +
>>> +   rc = rte_eth_tx_queue_info_get(fs->tx_port,
>>> +   fs->tx_queue, &tx_qinfo);
>>> +   if (rc == 0) {
>>> +   ports[fs->tx_port].txq[fs-
>>> tx_queue].state =
>>> +   tx_qinfo.queue_state;
>>> +   } else if (rc == -ENOTSUP) {
>>> +   /* Set the txq state to
>> RTE_ETH_QUEUE_STATE_STARTED
>>> +* to ensure that the PMDs do not
>> implement
>>> +* rte_eth_tx_queue_info_get can
>> forward.
>>> +*/
>>> +   ports[fs->tx_port].txq[fs-
>>> tx_queue].state =
>>> +
>>  RTE_ETH_QUEUE_STATE_STARTED;
>>> +   } else {
>>> +   TESTPMD_LOG(WARNING,
>>> +   "Failed to get tx queue
>> info\n");
>>> +   }
>>> +   }
>>> stream_init(fwd_streams[i]);
>>> +   }
>>> +   }
>>>
>>
>>
>> Testpmd duplicates some dpdk/ethdev state/config in application level, and
>> this can bite in multiple cases, as it is happening here.
>>
>> I am not sure if this was a design decision, but I think instead of testpmd
>> storing ethdev related state/config in application level, it should store 
>> only
>> application level state/config, and when ethdev related state/config is
>> required app should get it directly from ethdev.
>>
>> It may be too late already for testpmd, there is a mixed usage, but I am for
>> preferring this approach when there is an opportunity.
>>
>>
>>
>> For above issue, why queue state needs to be stored in application level 
>> 'port'
>> variable?
>> Where is this queue state used?
>>
>> Can it work to get queue 

[PATCH v5 0/1] devtools: add tracepoint check in checkpatch

2023-03-07 Thread Ankur Dwivedi
This patch series adds a validation in checkpatch tool to check if
tracepoint is present in any new function added in ethdev, eventdev
cryptodev and mempool library.

v5:
 - Copied the build_map_changes function from check-symbol-change.sh to
   check-tracepoint.sh.
 - Added eventdev, cryptodev and mempool in libdir in check-tracepoint.sh.
 
v4:
 - Rebased on the recent next-net branch.
 - Refined logic to find function definition.
 - Updated year in the license in devtools/check-tracepoint.sh.
 - Removed cryptodev, added ethdev in libdir in
   devtools/check-tracepoint.sh. 

v3:
 - Split the v2 patch into 2 patches.
 - The file common-func.sh is renamed to build-symbol-map.sh.
 - Removed check-tracepoint.py file.
 - Code improvements in check-tracepoint.sh.

v2:
 - Add check for parent directory.

Ankur Dwivedi (1):
  devtools: add tracepoint check in checkpatch

 devtools/check-tracepoint.sh | 223 +++
 devtools/checkpatches.sh |   9 ++
 devtools/trace-skiplist.txt  |   0
 3 files changed, 232 insertions(+)
 create mode 100755 devtools/check-tracepoint.sh
 create mode 100644 devtools/trace-skiplist.txt

-- 
2.25.1



[PATCH v5 1/1] devtools: add tracepoint check in checkpatch

2023-03-07 Thread Ankur Dwivedi
This patch adds a validation in checkpatch tool, to check if a
tracepoint is present in any new function added in cryptodev, ethdev,
eventdev and mempool library.

In this patch, the build_map_changes function is copied from
check-symbol-change.sh to check-tracepoint.sh. The check-tracepoint.sh
script uses build_map_changes function to create a map of functions.
In the map, the newly added functions, added in the experimental section
are identified and their definition are checked for the presence of
tracepoint. The checkpatch return error if the tracepoint is not present.

For functions for which trace is not needed, they can be added to
devtools/trace-skiplist.txt file. The above tracepoint check will be
skipped for them.

Signed-off-by: Ankur Dwivedi 
---
 devtools/check-tracepoint.sh | 223 +++
 devtools/checkpatches.sh |   9 ++
 devtools/trace-skiplist.txt  |   0
 3 files changed, 232 insertions(+)
 create mode 100755 devtools/check-tracepoint.sh
 create mode 100644 devtools/trace-skiplist.txt

diff --git a/devtools/check-tracepoint.sh b/devtools/check-tracepoint.sh
new file mode 100755
index 00..88d6b1fd53
--- /dev/null
+++ b/devtools/check-tracepoint.sh
@@ -0,0 +1,223 @@
+#!/bin/sh
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Neil Horman 
+# Copyright(C) 2023 Marvell.
+
+selfdir=$(dirname $(readlink -f $0))
+
+# Library for trace check
+libdir="cryptodev ethdev eventdev mempool"
+
+# Functions for which the trace check can be skipped
+skiplist="$selfdir/trace-skiplist.txt"
+
+build_map_changes()
+{
+   local fname="$1"
+   local mapdb="$2"
+
+   cat "$fname" | awk '
+   # Initialize our variables
+   BEGIN {map="";sym="";ar="";sec=""; in_sec=0; in_map=0}
+
+   # Anything that starts with + or -, followed by an a
+   # and ends in the string .map is the name of our map file
+   # This may appear multiple times in a patch if multiple
+   # map files are altered, and all section/symbol names
+   # appearing between a triggering of this rule and the
+   # next trigger of this rule are associated with this file
+   /[-+] [ab]\/.*\.map/ {map=$2; in_map=1; next}
+
+   # The previous rule catches all .map files, anything else
+   # indicates we left the map chunk.
+   /[-+] [ab]\// {in_map=0}
+
+   # Triggering this rule, which starts a line and ends it
+   # with a { identifies a versioned section.  The section name is
+   # the rest of the line with the + and { symbols removed.
+   # Triggering this rule sets in_sec to 1, which actives the
+   # symbol rule below
+   /^.*{/ {
+   gsub("+", "");
+   if (in_map == 1) {
+   sec=$(NF-1); in_sec=1;
+   }
+   }
+
+   # This rule identifies the end of a section, and disables the
+   # symbol rule
+   /.*}/ {in_sec=0}
+
+   # This rule matches on a + followed by any characters except a :
+   # (which denotes a global vs local segment), and ends with a ;.
+   # The semicolon is removed and the symbol is printed with its
+   # association file name and version section, along with an
+   # indicator that the symbol is a new addition.  Note this rule
+   # only works if we have found a version section in the rule
+   # above (hence the in_sec check) And found a map file (the
+   # in_map check).  If we are not in a map chunk, do nothing.  If
+   # we are in a map chunk but not a section chunk, record it as
+   # unknown.
+   /^+[^}].*[^:*];/ {gsub(";","");sym=$2;
+   if (in_map == 1) {
+   if (in_sec == 1) {
+   print map " " sym " " sec " add"
+   } else {
+   print map " " sym " unknown add"
+   }
+   }
+   }
+
+   # This is the same rule as above, but the rule matches on a
+   # leading - rather than a +, denoting that the symbol is being
+   # removed.
+   /^-[^}].*[^:*];/ {gsub(";","");sym=$2;
+   if (in_map == 1) {
+   if (in_sec == 1) {
+   print map " " sym " " sec " del"
+   } else {
+   print map " " sym " unknown del"
+   }
+   }
+   }' > "$mapdb"
+
+   sort -u "$mapdb" > "$mapdb.2"
+   mv -f "$mapdb.2" "$mapdb"
+
+}
+
+find_trace_f

Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

2023-03-07 Thread Ferruh Yigit
On 3/7/2023 8:25 AM, fengchengwen wrote:
> 
> 
> On 2023/3/6 19:13, Konstantin Ananyev wrote:
>>
>>
>> In the proactive error handling mode, the PMD will set the data path
>> pointers to dummy functions and then try recovery, in this period the
>> application may still invoking data path API. This will introduce a
>> race-condition with data path which may lead to crash [1].
>>
>> Although the PMD added delay after setting data path pointers to 
>> cover
>> the above race-condition, it reduces the probability, but it doesn't
>> solve the problem.
>>
>> To solve the race-condition problem fundamentally, the following
>> requirements are added:
>> 1. The PMD should set the data path pointers to dummy functions after
>> report RTE_ETH_EVENT_ERR_RECOVERING event.
>> 2. The application should stop data path API invocation when process
>> the RTE_ETH_EVENT_ERR_RECOVERING event.
>> 3. The PMD should set the data path pointers to valid functions 
>> before
>> report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>> 4. The application should enable data path API invocation when 
>> process
>> the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>

 How this is solving the race-condition, by pushing responsibility to
 stop data path to application?
>>>
>>> Exactly, it becomes application responsibility to make sure data-path is
>>> stopped/suspended before recovery will continue.
>>>
>>
>> From documentation of the feature:
>>
>> ``
>> Because the PMD recovers automatically,
>> the application can only sense that the data flow is disconnected for a
>> while and the control API returns an error in this period.
>>
>> In order to sense the error happening/recovering, as well as to restore
>> some additional configuration, three events are available:
>> ``
>>
>> It looks like initial design is to use events mainly inform application
>> about what happened and mainly for re-configuration.
>>
>> Although I am don't disagree to involve the application, I am not sure
>> that is part of current design.
>
> I thought we all agreed that initial design contain some fallacies that
> need to fixed, no?
> Statement that with current rte_ethdev design error recovery can be done
> without interaction with the app (to stop/suspend data/control path)
> is the main one I think.
> It needs some interaction with app layer, one way or another.
>

 What if application is not interested in recovery modes at all and not
 registered any callback for the recovery?
>>>
>>>
>>> Are you saying there is no way for application to disable
>>> automatic recovery in PMD if it is not interested
>>> (or can't full-fill per-requesties for it)?
>>> If so, then yes it is a problem and we need to fix it.
>>> I assumed that such mechanism to disable unwanted events already exists,
>>> but I can't find anything.
>>> Wonder what would be the easiest way here - can PMD make a decision
>>> based on callback return value, or do we need a new API to
>>> enable/disable callbacks, or ...?
>>>
>>>
>>
>> As far as I can see automatic recovery is not configurable by app.
>>
>> But that is not all, PMD sends events to application but PMD can't know
>> if application is handling them or not, so with current design PMD can't
>> rely on to app.
>
> Well, PMD invokes user provided callback.
> One way to fix that problem - if there is no callback provided,
> or callback returns an error code - PMD can assume that recovery
> should not be done.
> That is probably not the best design choice, but at least it will allow
> to fix the problem without too many changes and introducing new API.
> That could be sort of a 'quick fix'.
> In a meanwhile we can think about new/better approach for that.
>

 -rc2 for 23.03 is a few days away.

 What do you think to have 'quick fix' as modifying how driver updates
 burst ops to prevent the race condition, for this release?
> 
> The 'quick fix', do you mean only update function pointer (without rxq 
> setting) ?
> Currently the PMDs which announced support "proactive error handling mode" 
> already
> do this.
> 

Yes.
I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()'
updates all fields in 'rte_eth_fp_ops' but only function pointer seems
changed in the driver, resulting only function pointers to be updated.

The discussion about race condition started with patch [1], which
mentions a crash because of a race condition. Later in discussions,
recovery event given as a sample for where the race can occur, that is
why we are here.

But after above info, althou

Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

2023-03-07 Thread fengchengwen
On 2023/3/7 20:07, Ferruh Yigit wrote:
> On 3/7/2023 8:25 AM, fengchengwen wrote:
>>
>>
>> On 2023/3/6 19:13, Konstantin Ananyev wrote:
>>>
>>>
>>> In the proactive error handling mode, the PMD will set the data path
>>> pointers to dummy functions and then try recovery, in this period 
>>> the
>>> application may still invoking data path API. This will introduce a
>>> race-condition with data path which may lead to crash [1].
>>>
>>> Although the PMD added delay after setting data path pointers to 
>>> cover
>>> the above race-condition, it reduces the probability, but it doesn't
>>> solve the problem.
>>>
>>> To solve the race-condition problem fundamentally, the following
>>> requirements are added:
>>> 1. The PMD should set the data path pointers to dummy functions 
>>> after
>>> report RTE_ETH_EVENT_ERR_RECOVERING event.
>>> 2. The application should stop data path API invocation when process
>>> the RTE_ETH_EVENT_ERR_RECOVERING event.
>>> 3. The PMD should set the data path pointers to valid functions 
>>> before
>>> report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>> 4. The application should enable data path API invocation when 
>>> process
>>> the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>
>
> How this is solving the race-condition, by pushing responsibility to
> stop data path to application?

 Exactly, it becomes application responsibility to make sure data-path 
 is
 stopped/suspended before recovery will continue.

>>>
>>> From documentation of the feature:
>>>
>>> ``
>>> Because the PMD recovers automatically,
>>> the application can only sense that the data flow is disconnected for a
>>> while and the control API returns an error in this period.
>>>
>>> In order to sense the error happening/recovering, as well as to restore
>>> some additional configuration, three events are available:
>>> ``
>>>
>>> It looks like initial design is to use events mainly inform application
>>> about what happened and mainly for re-configuration.
>>>
>>> Although I am don't disagree to involve the application, I am not sure
>>> that is part of current design.
>>
>> I thought we all agreed that initial design contain some fallacies that
>> need to fixed, no?
>> Statement that with current rte_ethdev design error recovery can be done
>> without interaction with the app (to stop/suspend data/control path)
>> is the main one I think.
>> It needs some interaction with app layer, one way or another.
>>
>
> What if application is not interested in recovery modes at all and not
> registered any callback for the recovery?


 Are you saying there is no way for application to disable
 automatic recovery in PMD if it is not interested
 (or can't full-fill per-requesties for it)?
 If so, then yes it is a problem and we need to fix it.
 I assumed that such mechanism to disable unwanted events already 
 exists,
 but I can't find anything.
 Wonder what would be the easiest way here - can PMD make a decision
 based on callback return value, or do we need a new API to
 enable/disable callbacks, or ...?


>>>
>>> As far as I can see automatic recovery is not configurable by app.
>>>
>>> But that is not all, PMD sends events to application but PMD can't know
>>> if application is handling them or not, so with current design PMD can't
>>> rely on to app.
>>
>> Well, PMD invokes user provided callback.
>> One way to fix that problem - if there is no callback provided,
>> or callback returns an error code - PMD can assume that recovery
>> should not be done.
>> That is probably not the best design choice, but at least it will allow
>> to fix the problem without too many changes and introducing new API.
>> That could be sort of a 'quick fix'.
>> In a meanwhile we can think about new/better approach for that.
>>
>
> -rc2 for 23.03 is a few days away.
>
> What do you think to have 'quick fix' as modifying how driver updates
> burst ops to prevent the race condition, for this release?
>>
>> The 'quick fix', do you mean only update function pointer (without rxq 
>> setting) ?
>> Currently the PMDs which announced support "proactive error handling mode" 
>> already
>> do this.
>>
> 
> Yes.
> I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()'
> updates all fields in 'rte_eth_fp_ops' but only function pointer seems
> changed in the driver, resulting only function pointers to be updated.
> 
> The discussion about race condition started with p

RE: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

2023-03-07 Thread Konstantin Ananyev


> >>> In the proactive error handling mode, the PMD will set the data 
> >>> path
> >>> pointers to dummy functions and then try recovery, in this period 
> >>> the
> >>> application may still invoking data path API. This will introduce 
> >>> a
> >>> race-condition with data path which may lead to crash [1].
> >>>
> >>> Although the PMD added delay after setting data path pointers to 
> >>> cover
> >>> the above race-condition, it reduces the probability, but it 
> >>> doesn't
> >>> solve the problem.
> >>>
> >>> To solve the race-condition problem fundamentally, the following
> >>> requirements are added:
> >>> 1. The PMD should set the data path pointers to dummy functions 
> >>> after
> >>> report RTE_ETH_EVENT_ERR_RECOVERING event.
> >>> 2. The application should stop data path API invocation when 
> >>> process
> >>> the RTE_ETH_EVENT_ERR_RECOVERING event.
> >>> 3. The PMD should set the data path pointers to valid functions 
> >>> before
> >>> report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>> 4. The application should enable data path API invocation when 
> >>> process
> >>> the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>
> >
> > How this is solving the race-condition, by pushing responsibility to
> > stop data path to application?
> 
>  Exactly, it becomes application responsibility to make sure 
>  data-path is
>  stopped/suspended before recovery will continue.
> 
> >>>
> >>> From documentation of the feature:
> >>>
> >>> ``
> >>> Because the PMD recovers automatically,
> >>> the application can only sense that the data flow is disconnected for 
> >>> a
> >>> while and the control API returns an error in this period.
> >>>
> >>> In order to sense the error happening/recovering, as well as to 
> >>> restore
> >>> some additional configuration, three events are available:
> >>> ``
> >>>
> >>> It looks like initial design is to use events mainly inform 
> >>> application
> >>> about what happened and mainly for re-configuration.
> >>>
> >>> Although I am don't disagree to involve the application, I am not sure
> >>> that is part of current design.
> >>
> >> I thought we all agreed that initial design contain some fallacies that
> >> need to fixed, no?
> >> Statement that with current rte_ethdev design error recovery can be 
> >> done
> >> without interaction with the app (to stop/suspend data/control path)
> >> is the main one I think.
> >> It needs some interaction with app layer, one way or another.
> >>
> >
> > What if application is not interested in recovery modes at all and 
> > not
> > registered any callback for the recovery?
> 
> 
>  Are you saying there is no way for application to disable
>  automatic recovery in PMD if it is not interested
>  (or can't full-fill per-requesties for it)?
>  If so, then yes it is a problem and we need to fix it.
>  I assumed that such mechanism to disable unwanted events already 
>  exists,
>  but I can't find anything.
>  Wonder what would be the easiest way here - can PMD make a decision
>  based on callback return value, or do we need a new API to
>  enable/disable callbacks, or ...?
> 
> 
> >>>
> >>> As far as I can see automatic recovery is not configurable by app.
> >>>
> >>> But that is not all, PMD sends events to application but PMD can't 
> >>> know
> >>> if application is handling them or not, so with current design PMD 
> >>> can't
> >>> rely on to app.
> >>
> >> Well, PMD invokes user provided callback.
> >> One way to fix that problem - if there is no callback provided,
> >> or callback returns an error code - PMD can assume that recovery
> >> should not be done.
> >> That is probably not the best design choice, but at least it will allow
> >> to fix the problem without too many changes and introducing new API.
> >> That could be sort of a 'quick fix'.
> >> In a meanwhile we can think about new/better approach for that.
> >>
> >
> > -rc2 for 23.03 is a few days away.
> >
> > What do you think to have 'quick fix' as modifying how driver updates
> > burst ops to prevent the race condition, for this release?
> >>
> >> The 'quick fix', do you mean only update function pointer (without rxq 
> >> setting) ?
> >> Currently the PMDs which announced support "proactive error handling mode" 
> >> already
> >> do this.
> >>
> >
> > Yes.
> > I checked hns3, it does as you said, hns3_eth_dev_fp_ops_conf

RE: [PATCH 2/2] net/mlx5: add MPLS tunnel support for HWS

2023-03-07 Thread Slava Ovsiienko
> -Original Message-
> From: Michael Baum 
> Sent: среда, 8 февраля 2023 г. 08:19
> To: dev@dpdk.org
> Cc: Matan Azrad ; Raslan Darawsheh
> ; Slava Ovsiienko ; Ori Kam
> 
> Subject: [PATCH 2/2] net/mlx5: add MPLS tunnel support for HWS
> 
> Add support for MPLS tunnel item in HWS.
> 
> Signed-off-by: Michael Baum 
> Acked-by: Ori Kam 
Acked-by: Viacheslav Ovsiienko 



RE: [PATCH v3 1/2] net/mlx5/hws: support matching on MPLSoUDP

2023-03-07 Thread Slava Ovsiienko
> -Original Message-
> From: Michael Baum 
> Sent: четверг, 23 февраля 2023 г. 09:48
> To: dev@dpdk.org
> Cc: Matan Azrad ; Raslan Darawsheh
> ; Slava Ovsiienko ; Erez Shitrit
> 
> Subject: [PATCH v3 1/2] net/mlx5/hws: support matching on MPLSoUDP
> 
> From: Erez Shitrit 
> 
> Add support for matching MPLS labels while it is under UDP protocol.
> Matching up to 5 MPLS labels with or without the MPLS value.
> 
> Signed-off-by: Erez Shitrit 
Acked-by: Viacheslav Ovsiienko 



Re: [PATCH] net/nfp: write link speed to control BAR

2023-03-07 Thread Ferruh Yigit
On 3/6/2023 7:06 AM, Chaoyong He wrote:
>> On 2/21/2023 6:29 AM, Chaoyong He wrote:
>>> From: James Hershaw 
>>>
>>> Due to changes in the firmware for NFPs, firmware will no longer write
>>> the link speed of a port to the control BAR. In line with the
>>> behaviour of the kernel NFP driver, this is now handled by the PMD by
>>> reading the value provided by the NSP in the nfp_eth_table struct
>>> within the pf_dev of the port and subsequently writing this value to the
>> control BAR.
>>>
>>
>> Don't you need some kind of FW version check to figure out if
>> 'NFP_NET_CFG_STS_NSP_LINK_RATE' needs to be updated by driver or not?
>>
>> How do you manage driver <-> FW dependency?
>>
>>
>>> Signed-off-by: James Hershaw 
>>> Reviewed-by: Niklas Söderlund 
>>> Reviewed-by: Chaoyong He 
>>> ---
>>>  drivers/net/nfp/nfp_common.c | 90 ++---
>> ---
>>>  drivers/net/nfp/nfp_ctrl.h   |  9 
>>>  2 files changed, 65 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/drivers/net/nfp/nfp_common.c
>>> b/drivers/net/nfp/nfp_common.c index 5922bfea8e..006ea58008 100644
>>> --- a/drivers/net/nfp/nfp_common.c
>>> +++ b/drivers/net/nfp/nfp_common.c
>>> @@ -52,6 +52,53 @@
>>>  #include 
>>>  #include 
>>>
>>> +static const uint32_t nfp_net_link_speed_nfp2rte[] = {
>>> +   [NFP_NET_CFG_STS_LINK_RATE_UNSUPPORTED] =
>> RTE_ETH_SPEED_NUM_NONE,
>>> +   [NFP_NET_CFG_STS_LINK_RATE_UNKNOWN] =
>> RTE_ETH_SPEED_NUM_NONE,
>>> +   [NFP_NET_CFG_STS_LINK_RATE_1G]  =
>> RTE_ETH_SPEED_NUM_1G,
>>> +   [NFP_NET_CFG_STS_LINK_RATE_10G] =
>> RTE_ETH_SPEED_NUM_10G,
>>> +   [NFP_NET_CFG_STS_LINK_RATE_25G] =
>> RTE_ETH_SPEED_NUM_25G,
>>> +   [NFP_NET_CFG_STS_LINK_RATE_40G] =
>> RTE_ETH_SPEED_NUM_40G,
>>> +   [NFP_NET_CFG_STS_LINK_RATE_50G] =
>> RTE_ETH_SPEED_NUM_50G,
>>> +   [NFP_NET_CFG_STS_LINK_RATE_100G]=
>> RTE_ETH_SPEED_NUM_100G,
>>> +};
>>> +
>>> +static uint32_t
>>> +nfp_net_link_speed_rte2nfp(uint32_t speed) {
>>> +   uint32_t i;
>>> +
>>> +   for (i = 0; i < RTE_DIM(nfp_net_link_speed_nfp2rte); i++) {
>>> +   if (speed == nfp_net_link_speed_nfp2rte[i])
>>> +   return i;
>>> +   }
>>> +
>>> +   return NFP_NET_CFG_STS_LINK_RATE_UNKNOWN;
>>> +}
>>> +
>>> +static void
>>> +nfp_net_notify_port_speed(struct rte_eth_dev *dev) {
>>> +   struct nfp_net_hw *hw;
>>> +   struct nfp_eth_table *eth_table;
>>> +   uint32_t nn_link_status;
>>> +
>>> +   hw = NFP_NET_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>> +   eth_table = hw->pf_dev->nfp_eth_table;
>>> +
>>> +   nn_link_status = nn_cfg_readl(hw, NFP_NET_CFG_STS);
>>> +   nn_link_status = (nn_link_status >>
>> NFP_NET_CFG_STS_LINK_RATE_SHIFT) &
>>> +   NFP_NET_CFG_STS_LINK_RATE_MASK;
>>> +
>>> +   if ((nn_link_status & NFP_NET_CFG_STS_LINK) == 0) {
>>> +   nn_cfg_writel(hw, NFP_NET_CFG_STS_NSP_LINK_RATE,
>> NFP_NET_CFG_STS_LINK_RATE_UNKNOWN);
>>> +   return;
>>> +   }
>>> +
>>> +   nn_cfg_writel(hw, NFP_NET_CFG_STS_NSP_LINK_RATE,
>>> + nfp_net_link_speed_rte2nfp(eth_table->ports[hw-
>>> idx].speed));
>>
>> PF driver writes link speed to 'NFP_NET_CFG_STS_NSP_LINK_RATE' register,
>> but 'nfp_net_link_update()' still gets it from 'NFP_NET_CFG_STS'
>> register (via 'nfp_net_link_speed_nfp2rte[nn_link_status]').
>>
>> Shouldn't 'nfp_net_link_update()' needs to be updated to read speed from
>> 'NFP_NET_CFG_STS_NSP_LINK_RATE' register?
> 
> Sorry for the late response, we spend a lot of time to check and discuss.
> 
> For older firmware, a full word is allocated (NFP_NET_CFG_STS) to report link 
> status and port speed to the driver.
> However, in the interests of keeping FW files port-speed agnostic in the 
> future, 
> the upper 16 bits are no longer written to by FW, so we write the speed to 
> that address (NFP_NET_CFG_STS_LINK_RATE).
> The lower 16 bits (link status) are still handled by firmware.
> 

But 'nfp_net_link_update()' still gets the links speed from lower 16
bits. Probably I am missing something but please let me understand.


link_update()  notify_port_speed()
 read(speed)writel(speed)
  ▲  │
  │  │
  │  │
 ┌┴─┐▼┐
 │  │ │
 │  │LINK_RATE│
 └──┴─┘
0x34   0x36
 ││
 └ CFG_STS ───┘


Or is it something like when you update upper half of the register, FW
reads it and reflects the value to the lower half of the register?


And since 'NFP_NET_CFG_STS_NSP_LINK_RATE' is 16 bits, is it correct to
use 'nn_cfg_writel()' to update it?

> These changes are completely backwards compatible with older firmware 
> versions, so no FW version check is required.

ack


Re: [PATCH] service: split tests to perf and autotest to avoid spurious CI failures

2023-03-07 Thread David Marchand
On Fri, Mar 3, 2023 at 11:59 AM Van Haaren, Harry
 wrote:
> > > +   .setup = testsuite_setup,
> > > +   .teardown = testsuite_teardown,
> > > +   .unit_test_cases = {
> > > +   TEST_CASE_ST(dummy_register, NULL, service_attr_get),
> > > +   TEST_CASE_ST(dummy_register, NULL, 
> > > service_lcore_attr_get),
> > > +   TEST_CASE_ST(dummy_register, NULL, 
> > > service_lcore_start_stop),
> >
> > Looking at service_lcore_running_check(), don't you think
> > service_may_be_active() and service_active_two_cores() are also
> > subject to race?
>
> Perhaps, but those haven't *actually* been failing in any of the reports.
> I'd prefer leave tests running if they're not causing issues in the CI.

service_may_be_active did fail in the near past (report from October
that triggered the discussion and the timeout extension patch).
So my fear is that we will see some ocurrences.

Time will tell.


-- 
David Marchand



Re: [PATCH v2] service: split tests to perf and autotest to avoid spurious CI failures

2023-03-07 Thread David Marchand
On Fri, Mar 3, 2023 at 2:01 PM Harry van Haaren
 wrote:
>
> On some CI runs, some service-cores tests spuriously fail as the
> service lcore thread is not actually scheduled by the OS in the
> given amount of time.
>
> Increasing timeouts has not resolved the issue in the CI, so the
> solution in this patch is to move them to a separate perf test
> suite.
>
> Signed-off-by: Harry van Haaren 

Applied, thanks.


-- 
David Marchand



Re: [PATCH v2] eal: fix thread race in control thread creation

2023-03-07 Thread David Marchand
On Thu, Mar 2, 2023 at 5:18 AM Honnappa Nagarahalli
 wrote:
> > > >   if (ctrl_thread_init(arg) != 0)
> > > >   return NULL;
> > > >
> > > > - return start_routine(params->arg);
> > > > + return start_routine(start_arg);
> > > We can free 'params' here after 'start_routine' returns.
> >
> > I guess it doesn't matter if the allocation is retained for the duration of
> > start_routine() which could be ~long.
> Yes, that's what I thought, it is a small size.
>
> >
> > David/Honnappah let me know what you decide. if you'd prefer to change to
> > honnappah's suggestion i'll put a new version up.

Hum, how would it look like?
- the parent thread would free params if the child thread fails to set affinity,
- the child thread would free params otherwise,

Is this what you propose?

This works, but we have to be extra careful if any change is done in
this part later.


-- 
David Marchand



Re: [PATCH v9 01/21] net/cpfl: support device initialization

2023-03-07 Thread Ferruh Yigit
On 3/2/2023 9:20 PM, Mingxia Liu wrote:
> Support device init and add the following dev ops:
>  - dev_configure
>  - dev_close
>  - dev_infos_get
>  - link_update
>  - dev_supported_ptypes_get
> 
> Signed-off-by: Mingxia Liu 

<...>

> +static void
> +cpfl_handle_virtchnl_msg(struct cpfl_adapter_ext *adapter)
> +{
> + struct idpf_adapter *base = &adapter->base;
> + struct idpf_dma_mem *dma_mem = NULL;
> + struct idpf_hw *hw = &base->hw;
> + struct virtchnl2_event *vc_event;
> + struct idpf_ctlq_msg ctlq_msg;
> + enum idpf_mbx_opc mbx_op;
> + struct idpf_vport *vport;
> + enum virtchnl_ops vc_op;
> + uint16_t pending = 1;
> + int ret;
> +
> + while (pending) {
> + ret = idpf_vc_ctlq_recv(hw->arq, &pending, &ctlq_msg);
> + if (ret) {
> + PMD_DRV_LOG(INFO, "Failed to read msg from virtual 
> channel, ret: %d", ret);
> + return;
> + }
> +
> + memcpy(base->mbx_resp, ctlq_msg.ctx.indirect.payload->va,
> +IDPF_DFLT_MBX_BUF_SIZE);
> +
> + mbx_op = rte_le_to_cpu_16(ctlq_msg.opcode);
> + vc_op = rte_le_to_cpu_32(ctlq_msg.cookie.mbx.chnl_opcode);
> + base->cmd_retval = 
> rte_le_to_cpu_32(ctlq_msg.cookie.mbx.chnl_retval);
> +
> + switch (mbx_op) {
> + case idpf_mbq_opc_send_msg_to_peer_pf:
> + if (vc_op == VIRTCHNL2_OP_EVENT) {


Raslan reported following build error [1], 'VIRTCHNL2_OP_EVENT' is not
an element of "enum virtchnl_ops", can you please check?


I guess there are a few options, have a new enum for virtchnl2, like
"enum virtchnl2_ops" which inlucde all 'VIRTCHNL2_OP_',

OR

use 'uint32_t' type (instead of "enum virtchnl_ops") when
'VIRTCHNL2_OP_' opcodes can be used, this seems simpler.


BTW, this is same in the idfp driver.


[1]
drivers/libtmp_rte_net_cpfl.a.p/net_cpfl_cpfl_ethdev.c.o -c
../../root/dpdk/drivers/net/cpfl/cpfl_ethdev.c
../../root/dpdk/drivers/net/cpfl/cpfl_ethdev.c:1118:14: error:
comparison of constant 522 with expression of type 'enum virtchnl_ops'
is always false [-Werror,-Wtautological-constant-out-of-range-compare]
if (vc_op == VIRTCHNL2_OP_EVENT) {
~ ^  ~~
1 error generated.



Re: [PATCH] net/nfp: reset profile config while modify profile

2023-03-07 Thread Ferruh Yigit
On 2/28/2023 8:56 AM, Chaoyong He wrote:
> From: Jin Liu 
> 
> While changing meter profile from pps rate limit mode to bps,
> the profile configuration was not reset, leaving the profile
> in pps mode. This lead to incorrect operation.
> 
> Fix this by clearing the profile before configuring it.
> 
> Fixes: 434c66e7e55c ("net/nfp: add meter profile options")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Jin Liu 
> Reviewed-by: Chaoyong He 
> Reviewed-by: Niklas Söderlund 

Applied to dpdk-next-net/main, thanks.


Re: [PATCH 1/2] eal: fix failure race and behavior of thread create

2023-03-07 Thread David Marchand
On Thu, Mar 2, 2023 at 7:44 PM Tyler Retzlaff
 wrote:
>
> In rte_thread_create setting affinity after pthread_create may fail.
> Such a failure should result in the entire rte_thread_create failing
> but doesn't.
>
> Additionally if there is a failure to set affinity a race exists where
> the creating thread will free ctx and depending on scheduling of the new
> thread it may also free ctx (double free).
>
> Resolve both of the above issues by using the pthread_setaffinity_np
> prior to thread creation to set the affinity of the created thread. By
> doing this no failure paths exist after pthread_create returns
> successfully.
>
> Fixes: ce6e911d20f6 ("eal: add thread lifetime API")
> Cc: sta...@dpdk.org
> Cc: roret...@linux.microsoft.com
>
> Signed-off-by: Tyler Retzlaff 

Reviewed-by: David Marchand 


-- 
David Marchand



Re: [PATCH v9 01/21] net/cpfl: support device initialization

2023-03-07 Thread Ferruh Yigit
On 3/7/2023 2:11 PM, Ferruh Yigit wrote:
> On 3/2/2023 9:20 PM, Mingxia Liu wrote:
>> Support device init and add the following dev ops:
>>  - dev_configure
>>  - dev_close
>>  - dev_infos_get
>>  - link_update
>>  - dev_supported_ptypes_get
>>
>> Signed-off-by: Mingxia Liu 
> 
> <...>
> 
>> +static void
>> +cpfl_handle_virtchnl_msg(struct cpfl_adapter_ext *adapter)
>> +{
>> +struct idpf_adapter *base = &adapter->base;
>> +struct idpf_dma_mem *dma_mem = NULL;
>> +struct idpf_hw *hw = &base->hw;
>> +struct virtchnl2_event *vc_event;
>> +struct idpf_ctlq_msg ctlq_msg;
>> +enum idpf_mbx_opc mbx_op;
>> +struct idpf_vport *vport;
>> +enum virtchnl_ops vc_op;
>> +uint16_t pending = 1;
>> +int ret;
>> +
>> +while (pending) {
>> +ret = idpf_vc_ctlq_recv(hw->arq, &pending, &ctlq_msg);
>> +if (ret) {
>> +PMD_DRV_LOG(INFO, "Failed to read msg from virtual 
>> channel, ret: %d", ret);
>> +return;
>> +}
>> +
>> +memcpy(base->mbx_resp, ctlq_msg.ctx.indirect.payload->va,
>> +   IDPF_DFLT_MBX_BUF_SIZE);
>> +
>> +mbx_op = rte_le_to_cpu_16(ctlq_msg.opcode);
>> +vc_op = rte_le_to_cpu_32(ctlq_msg.cookie.mbx.chnl_opcode);
>> +base->cmd_retval = 
>> rte_le_to_cpu_32(ctlq_msg.cookie.mbx.chnl_retval);
>> +
>> +switch (mbx_op) {
>> +case idpf_mbq_opc_send_msg_to_peer_pf:
>> +if (vc_op == VIRTCHNL2_OP_EVENT) {
> 
> 
> Raslan reported following build error [1], 'VIRTCHNL2_OP_EVENT' is not
> an element of "enum virtchnl_ops", can you please check?
> 
> 
> I guess there are a few options, have a new enum for virtchnl2, like
> "enum virtchnl2_ops" which inlucde all 'VIRTCHNL2_OP_',
> 
> OR
> 
> use 'uint32_t' type (instead of "enum virtchnl_ops") when
> 'VIRTCHNL2_OP_' opcodes can be used, this seems simpler.
> 
> 
> BTW, this is same in the idfp driver.
> 
> 
> [1]
> drivers/libtmp_rte_net_cpfl.a.p/net_cpfl_cpfl_ethdev.c.o -c
> ../../root/dpdk/drivers/net/cpfl/cpfl_ethdev.c
> ../../root/dpdk/drivers/net/cpfl/cpfl_ethdev.c:1118:14: error:
> comparison of constant 522 with expression of type 'enum virtchnl_ops'
> is always false [-Werror,-Wtautological-constant-out-of-range-compare]
> if (vc_op == VIRTCHNL2_OP_EVENT) {
> ~ ^  ~~
> 1 error generated.
> 

Thinking twice, I am not sure if this a compiler issue or coding issue,
many compilers doesn't complain about above issue.

As far as I understand C allows assigning unlisted values to enums,
because underneath it just uses an integer type.

Only caveat I can see is, the integer type used is not fixed,
technically compiler can select the type that fits all enum values, so
for above enum compiler can select an char type to store the values, but
fixed value is 522 out of the char limit may cause an issue. But in
practice I am not sure if compilers are selecting char as underlying
type, or if they all just use 'int'.



[PATCH] doc: fix naming of NVIDIA devices

2023-03-07 Thread Thomas Monjalon
The networking NVIDIA devices may be named as the following,
using 0 and Ax as fake suffixes:
- ConnectX-0
- ConnectX-0 Ax
- BlueField-0

Casing and missing hyphens are fixed to avoid future wrong copy/paste.

Signed-off-by: Thomas Monjalon 
---
 doc/guides/nics/mlx5.rst   | 8 
 doc/guides/platform/bluefield.rst  | 2 +-
 doc/guides/rel_notes/release_20_08.rst | 2 +-
 doc/guides/rel_notes/release_20_11.rst | 2 +-
 doc/guides/rel_notes/release_21_02.rst | 4 ++--
 doc/guides/rel_notes/release_21_05.rst | 2 +-
 doc/guides/rel_notes/release_21_08.rst | 2 +-
 doc/guides/rel_notes/release_21_11.rst | 4 ++--
 doc/guides/rel_notes/release_22_03.rst | 2 +-
 doc/guides/rel_notes/release_22_07.rst | 4 ++--
 drivers/net/mlx5/mlx5_mac.c| 2 +-
 11 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 6510e74fb9..b4714c0803 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -150,7 +150,7 @@ Limitations
 
 - Host shaper:
 
-  - Support BlueField series NIC from BlueField 2.
+  - Support BlueField series NIC from BlueField-2.
   - When configuring host shaper with 
MLX5_HOST_SHAPER_FLAG_AVAIL_THRESH_TRIGGERED flag set,
 only rates 0 and 100Mbps are supported.
 
@@ -398,9 +398,9 @@ Limitations
 
 - E-Switch Manager matching:
 
-  - For Bluefield with old FW
+  - For BlueField with old FW
 which doesn't expose the E-Switch Manager vport ID in the capability,
-matching E-Switch Manager should be used only in Bluefield embedded CPU 
mode.
+matching E-Switch Manager should be used only in BlueField embedded CPU 
mode.
 
 - Raw encapsulation:
 
@@ -1916,7 +1916,7 @@ Then the PMD call the callback registered previously,
 which will delay a while to let Rx queue empty,
 then disable host shaper.
 
-Let's assume we have a simple BlueField 2 setup:
+Let's assume we have a simple BlueField-2 setup:
 port 0 is uplink, port 1 is VF representor.
 Each port has 2 Rx queues.
 To control traffic from the host to the Arm device,
diff --git a/doc/guides/platform/bluefield.rst 
b/doc/guides/platform/bluefield.rst
index 09486747b1..98df515241 100644
--- a/doc/guides/platform/bluefield.rst
+++ b/doc/guides/platform/bluefield.rst
@@ -18,7 +18,7 @@ and common offload HW drivers of **NVIDIA BlueField** family 
SoC.
 Supported BlueField Platforms
 -
 
-- `BlueField 2 
`_
+- `BlueField-2 
`_
 
 
 Common Offload HW Drivers
diff --git a/doc/guides/rel_notes/release_20_08.rst 
b/doc/guides/rel_notes/release_20_08.rst
index 445e40fbac..619c3dd59f 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -466,7 +466,7 @@ Tested Platforms
 
 * Mellanox\ |reg| BlueField\ |reg| SmartNIC
 
-  * Mellanox\ |reg| BlueField\ |reg| 2 SmartNIC MT41686 - MBF2H332A-AEEOT 
(2x25G)
+  * Mellanox\ |reg| BlueField\ |reg|-2 SmartNIC MT41686 - MBF2H332A-AEEOT 
(2x25G)
 
 * Host interface: PCI Express 3.0 x16
 * Device ID: 15b3:a2d2
diff --git a/doc/guides/rel_notes/release_20_11.rst 
b/doc/guides/rel_notes/release_20_11.rst
index 7fd15398e4..76fd7372b8 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -880,7 +880,7 @@ Tested Platforms
 
 * Mellanox\ |reg| BlueField\ |reg| SmartNIC
 
-  * Mellanox\ |reg| BlueField\ |reg| 2 SmartNIC MT41686 - MBF2H332A-AEEOT 
(2x25G)
+  * Mellanox\ |reg| BlueField\ |reg|-2 SmartNIC MT41686 - MBF2H332A-AEEOT 
(2x25G)
 
 * Host interface: PCI Express 3.0 x16
 * Device ID: 15b3:a2d2
diff --git a/doc/guides/rel_notes/release_21_02.rst 
b/doc/guides/rel_notes/release_21_02.rst
index 5fbf5b3d43..b47593b214 100644
--- a/doc/guides/rel_notes/release_21_02.rst
+++ b/doc/guides/rel_notes/release_21_02.rst
@@ -135,7 +135,7 @@ New Features
 
 * **Added mlx5 compress PMD.**
 
-  Added a new compress PMD for Bluefield 2 adapters.
+  Added a new compress PMD for BlueField-2 adapters.
 
   See the :doc:`../compressdevs/mlx5` for more details.
 
@@ -443,7 +443,7 @@ Tested Platforms
 
 * Mellanox\ |reg| BlueField\ |reg| SmartNIC
 
-  * Mellanox\ |reg| BlueField\ |reg| 2 SmartNIC MT41686 - MBF2H332A-AEEOT 
(2x25G)
+  * Mellanox\ |reg| BlueField\ |reg|-2 SmartNIC MT41686 - MBF2H332A-AEEOT 
(2x25G)
 
 * Host interface: PCI Express 3.0 x16
 * Device ID: 15b3:a2d2
diff --git a/doc/guides/rel_notes/release_21_05.rst 
b/doc/guides/rel_notes/release_21_05.rst
index 49044ed422..4dc0573266 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -566,7 +566,7 @@ Tested Platforms
 
 * Mellanox\ |reg| BlueField\ |reg| SmartNIC
 
-  * Mellanox\ |reg| BlueField\ |reg| 2 SmartNIC MT41686 - MBF2H332A-AEEOT_A1 
(2x25G)
+  * Mellanox\ |reg| BlueField\ |reg|-2 SmartNIC MT41686 - MBF2H33

Re: [PATCH] net/nfp: fix the offload of multiple output actions

2023-03-07 Thread Ferruh Yigit
On 2/28/2023 9:03 AM, Chaoyong He wrote:
> When offload the flow with multiple output actions, the
> original logic add a flag to every output action wrongly,
> and this cause only the first output action can take effect.
> Fix it by only add the flag to the last output action.
> 
> Fixes: 4d946034bf9c ("net/nfp: support basic flow actions")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Chaoyong He 
> Reviewed-by: Niklas Söderlund 

Applied to dpdk-next-net/main, thanks.


Re: [PATCH v3 0/2] Add option to timestamp console log

2023-03-07 Thread Stephen Hemminger
On Tue, 7 Mar 2023 08:33:17 +0100
Morten Brørup  wrote:

> Here's some feature creep: Since the timestamp output is configurable, there 
> is no need to settle on one specific timestamp format. The option could allow 
> a choice between MONOTONIC and REALTIME, with an option for REALTIME to 
> include the date in ISO 8601 format (-MM-DD). And REALTIME could be UTC 
> or local time. You could consider taking a format string for strftime(), with 
> the extension that %f expands to 6 digit microseconds like in Python [1].

Does that have compiler static checking issues if you allow user to set formats?


Re: [PATCH v3 2/2] eal: add option to put timestamp on console output

2023-03-07 Thread Stephen Hemminger
On Tue, 7 Mar 2023 17:35:32 +0800
fengchengwen  wrote:

> On 2023/3/7 3:28, Stephen Hemminger wrote:
> > When debugging driver or startup issues, it is useful to have
> > a timestamp on each message printed. The messages in syslog
> > already have a timestamp, but often syslog is not available
> > during testing. The timestamp format is chosen to look
> > like the default Linux dmesg timestamp.
> > 
> > Example:
> > [   0.40] EAL: Probing VFIO support...
> > 
> > Signed-off-by: Stephen Hemminger 
> > ---
> >  .../freebsd_gsg/freebsd_eal_parameters.rst| 32 ++
> >  doc/guides/linux_gsg/linux_eal_parameters.rst |  5 +++
> >  lib/eal/common/eal_common_options.c   |  5 +++
> >  lib/eal/common/eal_internal_cfg.h |  1 +
> >  lib/eal/common/eal_options.h  |  2 +
> >  lib/eal/unix/eal_log.c| 42 +--
> >  6 files changed, 84 insertions(+), 3 deletions(-)
> > 
> > diff --git a/doc/guides/freebsd_gsg/freebsd_eal_parameters.rst 
> > b/doc/guides/freebsd_gsg/freebsd_eal_parameters.rst
> > index fba467a2ce92..99cff10e963c 100644
> > --- a/doc/guides/freebsd_gsg/freebsd_eal_parameters.rst
> > +++ b/doc/guides/freebsd_gsg/freebsd_eal_parameters.rst
> > @@ -18,3 +18,35 @@ FreeBSD-specific EAL parameters
> >  ---
> >  
> >  There are currently no FreeBSD-specific EAL command-line parameters 
> > available.
> > +
> > +Other options
> > +~
> > +
> > +*   ``--syslog ``
> > +
> > +Set syslog facility. Valid syslog facilities are::
> > +
> > +auth
> > +cron
> > +daemon
> > +ftp
> > +kern
> > +lpr
> > +mail
> > +news
> > +syslog
> > +user
> > +uucp
> > +local0
> > +local1
> > +local2
> > +local3
> > +local4
> > +local5
> > +local6
> > +local7  
> 
> This should add to commit 1/2 [PATCH v3 1/2] eal: unify logging code for 
> FreeBsd and Linux
> 
> > +
> > +*   ``--log-timestamp``
> > +
> > +Add a timestamp of seconds and microseconds to each log message
> > +written to standard output.
> > diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst 
> > b/doc/guides/linux_gsg/linux_eal_parameters.rst
> > index ea8f38139119..719ca6851625 100644
> > --- a/doc/guides/linux_gsg/linux_eal_parameters.rst
> > +++ b/doc/guides/linux_gsg/linux_eal_parameters.rst
> > @@ -135,3 +135,8 @@ Other options
> >  local5
> >  local6
> >  local7
> > +
> > +*   ``--log-timestamp``
> > +
> > +Add a timestamp of seconds and microseconds to each log message
> > +written to standard output.
> > diff --git a/lib/eal/common/eal_common_options.c 
> > b/lib/eal/common/eal_common_options.c
> > index 03059336987d..2d3d8e82f7f3 100644
> > --- a/lib/eal/common/eal_common_options.c
> > +++ b/lib/eal/common/eal_common_options.c
> > @@ -76,6 +76,7 @@ eal_long_options[] = {
> > {OPT_IOVA_MODE, 1, NULL, OPT_IOVA_MODE_NUM},
> > {OPT_LCORES,1, NULL, OPT_LCORES_NUM   },
> > {OPT_LOG_LEVEL, 1, NULL, OPT_LOG_LEVEL_NUM},
> > +   {OPT_LOG_TIMESTAMP, 0, NULL, OPT_LOG_TIMESTAMP_NUM},
> > {OPT_TRACE, 1, NULL, OPT_TRACE_NUM},
> > {OPT_TRACE_DIR, 1, NULL, OPT_TRACE_DIR_NUM},
> > {OPT_TRACE_BUF_SIZE,1, NULL, OPT_TRACE_BUF_SIZE_NUM   },
> > @@ -1833,6 +1834,9 @@ eal_parse_common_option(int opt, const char *optarg,
> > }
> > break;
> > }
> > +   case OPT_LOG_TIMESTAMP_NUM:
> > +   conf->log_timestamp = 1;
> > +   break;
> >  
> >  #ifndef RTE_EXEC_ENV_WINDOWS
> > case OPT_TRACE_NUM: {
> > @@ -2194,6 +2198,7 @@ eal_common_usage(void)
> >"  --"OPT_PROC_TYPE" Type of this process 
> > (primary|secondary|auto)\n"
> >  #ifndef RTE_EXEC_ENV_WINDOWS
> >"  --"OPT_SYSLOG"Set syslog facility\n"
> > +  "  --"OPT_LOG_TIMESTAMP" Timestamp log output\n"
> >  #endif
> >"  --"OPT_LOG_LEVEL"= Set global log level\n"
> >"  --"OPT_LOG_LEVEL"=:\n"
> > diff --git a/lib/eal/common/eal_internal_cfg.h 
> > b/lib/eal/common/eal_internal_cfg.h
> > index 167ec501fa79..33144c3619dd 100644
> > --- a/lib/eal/common/eal_internal_cfg.h
> > +++ b/lib/eal/common/eal_internal_cfg.h
> > @@ -85,6 +85,7 @@ struct internal_config {
> >  * per-node) non-legacy mode only.
> >  */
> > volatile int syslog_facility; /**< facility passed to openlog() */
> > +   volatile uint8_t log_timestamp;   /**< add timestamp to console output 
> > */
> > /** default interrupt mode for VFIO */
> > volatile enum rte_intr_mode vfio_intr_mode;
> > /** the shared VF token for VFIO-PCI bound PF and VFs devices */
> > diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h
> > index 3cc9cb641284..cc9723868e3c 100644
> > ---

Re: [PATCH v3 2/2] eal: add option to put timestamp on console output

2023-03-07 Thread Stephen Hemminger
On Tue, 7 Mar 2023 17:35:32 +0800
fengchengwen  wrote:

> The syslog will add timestamp, but the syslog backend will re-write 
> timestamp, so
> in the last, you can't find the real-timestamp of this log print. sometimes 
> it requires
> to get real log time.
> PS: we found it in our test environment because RR schedule hang too long 
> (similar question
> also found: https://bugzilla.redhat.com/show_bug.cgi?id=1855447).
> 
> So suggest add timestamp in syslog string also, and don't convert to 
> monotonic and just
> print as normal format (just like syslog).


Are you using systemd?

Never, never configure a DPDK application with real-time process priority.
Polling model and RT don't mix.


RE: [PATCH v2] net/mlx5: use just sufficient barrier for Arm platforms

2023-03-07 Thread Slava Ovsiienko
Hi, Honnappa

I'm sorry for delay - I had to play with patch, to compile this one with 
assembly listings
and check what code is actually generated on x86/ARM.

On x86 there is no difference at all (with and w/o patch), so no objection from 
my side.
On ARM we have:
w/o patch: dmb oshld
with patch: dmb ishld

What is the purpose of the barrier - to not allow the CQE read access 
reordering.
On x86, "compiler barrier" is quite enough (due to strong load/store ordering).
On ARM load/store might be reordered, AFAIU.

CQE resides in the host memory and can be directly written by the NIC via PCIe.
(In my understanding it can cause core cache(s) invalidations). 
I have a question - should we consider this as outer sharable domain?
Is it safe to have a barrier for inner domain only in our case?

We have updated the cqe_check() routine, sorry for this. Could you, please,
update the patch and send v3?

With best regards,
Slava

> -Original Message-
> From: Honnappa Nagarahalli 
> Sent: вторник, 15 ноября 2022 г. 03:46
> To: Slava Ovsiienko ; dev@dpdk.org; Ruifeng Wang
> ; Matan Azrad ; Shahaf Shuler
> 
> Cc: nd ; Matan Azrad ; sta...@dpdk.org;
> nd 
> Subject: RE: [PATCH v2] net/mlx5: use just sufficient barrier for Arm 
> platforms
> 
> 
> 
> >
> > >
> > > Hi, Honnappa
> > Hi Slava, thanks for the feedback.
> >
> > >
> > > We discussed the barrier here:
> > > http://patches.dpdk.org/project/dpdk/patch/20210606164948.35997-1-
> > > honnappa.nagaraha...@arm.com/
> > Yes, I have changed the patch according to the discussion. i.e.
> > barrier is needed, but different (inner sharable domain) barrier is 
> > required.
> >
> > >
> > > (BTW, it is good practice to keep the reference to previous patch
> > > versions below Commit Message of the next ones).
> > >
> > > This barrier is not about compiler ordering, it is about external HW
> > > agent memory action completions.
> > > So, I'm not sure the rte_atomic_thread_fence() is safe for x86 -
> > > patch impacts
> > > x86 as well.
> > The earlier barrier 'rte_io_rmb()', resolves to a compiler barrier on x86 
> > [1].
> > The rte_atomic_thread_fence(__ATOMIC_ACQUIRE) on x86 also acts as a
> > compiler barrier. So, there is no change for x86.
> >
> >
> > [1]
> > https://github.com/DPDK/dpdk/blob/main/lib/eal/x86/include/rte_atomic.
> > h#
> > L80
> Hi Slava, any more comments on this?
> 
> >
> > >
> > > With best regards,
> > > Slava
> > >
> > > > -Original Message-
> > > > From: Honnappa Nagarahalli 
> > > > Sent: Tuesday, August 30, 2022 23:01
> > > > To: dev@dpdk.org; honnappa.nagaraha...@arm.com;
> > > ruifeng.w...@arm.com;
> > > > Matan Azrad ; Shahaf Shuler
> > ;
> > > > Slava Ovsiienko 
> > > > Cc: n...@arm.com; Matan Azrad ; sta...@dpdk.org
> > > > Subject: [PATCH v2] net/mlx5: use just sufficient barrier for Arm
> > > > platforms
> > > >
> > > > cqe->op_own indicates if the CQE is owned by the NIC. The rest of
> > > > the fields in CQE should be read only after op_own is read. On Arm
> > > > platforms using "dmb ishld" is sufficient to enforce this.
> > > >
> > > > Fixes: 88c0733535d6 ("net/mlx5: extend Rx completion with error
> > > > handling")
> > > > Cc: ma...@mellanox.com
> > > > Cc: sta...@dpdk.org
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli 
> > > > Reviewed-by: Ruifeng Wang 
> > > > ---
> > > >  drivers/common/mlx5/mlx5_common.h | 6 +-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/common/mlx5/mlx5_common.h
> > > > b/drivers/common/mlx5/mlx5_common.h
> > > > index 5028a05b49..ac2e85b15f 100644
> > > > --- a/drivers/common/mlx5/mlx5_common.h
> > > > +++ b/drivers/common/mlx5/mlx5_common.h
> > > > @@ -195,7 +195,11 @@ check_cqe(volatile struct mlx5_cqe *cqe,
> > > > const uint16_t cqes_n,
> > > >
> > > > if (unlikely((op_owner != (!!(idx))) || (op_code ==
> > > > MLX5_CQE_INVALID)))
> > > > return MLX5_CQE_STATUS_HW_OWN;
> > > > -   rte_io_rmb();
> > > > +   /* Prevent speculative reading of other fields in CQE until
> > > > +* CQE is valid.
> > > > +*/
> > > > +   rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > > +
> > > > if (unlikely(op_code == MLX5_CQE_RESP_ERR ||
> > > >  op_code == MLX5_CQE_REQ_ERR))
> > > > return MLX5_CQE_STATUS_ERR;
> > > > --
> > > > 2.17.1



RE: [PATCH] mlx5: initially reading xstats does not cause seg fault

2023-03-07 Thread Slava Ovsiienko
Hi, Huzaifa

Could you, please, format commit message with capitals in the
sentence beginnings? And explanation can be less wordy a little bit.

With best regards,
Slava

> -Original Message-
> From: huzaifa.rahman 
> Sent: четверг, 18 августа 2022 г. 15:30
> To: Matan Azrad 
> Cc: dev@dpdk.org; Slava Ovsiienko ;
> huzaifa.rahman 
> Subject: [PATCH] mlx5: initially reading xstats does not cause seg fault
> 
> Bugzilla ID: 296
> 
> the size of counters array in mlx5_xstats_get() was smaller than the memory
> we are setting for this array in mlx5_os_read_dev_counters(). due to which
> the extra memory is corrupted and thus corrupting the seemingly unrelated
> variables.
> this happens at the first run only because the n function arg of
> mlx5_xstats_get() which is used to init counters array is initialized by 
> adding
> the preceding statistics which in our case (i.e first run) is zero. after the
> initialization in
> mlx5_os_stats_init() the mlx5_stats_n is populated and thus from then
> onward the counters array size is correct
> 
> my changes will only affect the flow of the first run when we need to 
> initialize
> stats in mlx5_os_stats_init(). the size of the counters array is set 
> according the
> mlx5_stats_n variable. by doing this we will avoid the memset corrupting
> other variables` memory
> 
> Signed-off-by: huzaifa.rahman 
> ---
>  drivers/net/mlx5/mlx5_stats.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
> index f64fa3587b..bccfec10fb 100644
> --- a/drivers/net/mlx5/mlx5_stats.c
> +++ b/drivers/net/mlx5/mlx5_stats.c
> @@ -40,7 +40,6 @@ mlx5_xstats_get(struct rte_eth_dev *dev, struct
> rte_eth_xstat *stats,  {
>   struct mlx5_priv *priv = dev->data->dev_private;
>   unsigned int i;
> - uint64_t counters[n];
>   struct mlx5_xstats_ctrl *xstats_ctrl = &priv->xstats_ctrl;
>   uint16_t mlx5_stats_n = xstats_ctrl->mlx5_stats_n;
> 
> @@ -51,8 +50,11 @@ mlx5_xstats_get(struct rte_eth_dev *dev, struct
> rte_eth_xstat *stats,
>   stats_n = mlx5_os_get_stats_n(dev);
>   if (stats_n < 0)
>   return stats_n;
> - if (xstats_ctrl->stats_n != stats_n)
> + if (xstats_ctrl->stats_n != stats_n) {
>   mlx5_os_stats_init(dev);
> + n = xstats_ctrl->mlx5_stats_n;
> + }
> + uint64_t counters[n];
>   ret = mlx5_os_read_dev_counters(dev, counters);
>   if (ret)
>   return ret;
> --
> 2.25.1



RE: [PATCH] mlx5: initially reading xstats does not cause seg fault

2023-03-07 Thread Slava Ovsiienko
Hi, Huzaifa

"n" - is the parameter of the mlx5_xstats_get() routine, provided by caller.
We should not change this - it specified the size of "struct rte_eth_xstat 
*stats " array.

With best regards,
Slava

> -Original Message-
> From: huzaifa.rahman 
> Sent: четверг, 18 августа 2022 г. 15:30
> To: Matan Azrad 
> Cc: dev@dpdk.org; Slava Ovsiienko ;
> huzaifa.rahman 
> Subject: [PATCH] mlx5: initially reading xstats does not cause seg fault
> 
> Bugzilla ID: 296
> 
> the size of counters array in mlx5_xstats_get() was smaller than the memory
> we are setting for this array in mlx5_os_read_dev_counters(). due to which
> the extra memory is corrupted and thus corrupting the seemingly unrelated
> variables.
> this happens at the first run only because the n function arg of
> mlx5_xstats_get() which is used to init counters array is initialized by 
> adding
> the preceding statistics which in our case (i.e first run) is zero. after the
> initialization in
> mlx5_os_stats_init() the mlx5_stats_n is populated and thus from then
> onward the counters array size is correct
> 
> my changes will only affect the flow of the first run when we need to 
> initialize
> stats in mlx5_os_stats_init(). the size of the counters array is set 
> according the
> mlx5_stats_n variable. by doing this we will avoid the memset corrupting
> other variables` memory
> 
> Signed-off-by: huzaifa.rahman 
> ---
>  drivers/net/mlx5/mlx5_stats.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
> index f64fa3587b..bccfec10fb 100644
> --- a/drivers/net/mlx5/mlx5_stats.c
> +++ b/drivers/net/mlx5/mlx5_stats.c
> @@ -40,7 +40,6 @@ mlx5_xstats_get(struct rte_eth_dev *dev, struct
> rte_eth_xstat *stats,  {
>   struct mlx5_priv *priv = dev->data->dev_private;
>   unsigned int i;
> - uint64_t counters[n];
>   struct mlx5_xstats_ctrl *xstats_ctrl = &priv->xstats_ctrl;
>   uint16_t mlx5_stats_n = xstats_ctrl->mlx5_stats_n;
> 
> @@ -51,8 +50,11 @@ mlx5_xstats_get(struct rte_eth_dev *dev, struct
> rte_eth_xstat *stats,
>   stats_n = mlx5_os_get_stats_n(dev);
>   if (stats_n < 0)
>   return stats_n;
> - if (xstats_ctrl->stats_n != stats_n)
> + if (xstats_ctrl->stats_n != stats_n) {
>   mlx5_os_stats_init(dev);
> + n = xstats_ctrl->mlx5_stats_n;
> + }
> + uint64_t counters[n];
>   ret = mlx5_os_read_dev_counters(dev, counters);
>   if (ret)
>   return ret;
> --
> 2.25.1



[PATCH] eal: use program_invocation_short_name

2023-03-07 Thread Stephen Hemminger
Glibc already has a documented variable with the program
name so use it instead of computing the value.

Signed-off-by: Stephen Hemminger 
---
 lib/eal/linux/eal.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index fabafbc39bd5..10cef8c90517 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -968,8 +968,6 @@ rte_eal_init(int argc, char **argv)
int i, fctret, ret;
static uint32_t run_once;
uint32_t has_run = 0;
-   const char *p;
-   static char logid[PATH_MAX];
char cpuset[RTE_CPU_AFFINITY_STR_LEN];
char thread_name[RTE_MAX_THREAD_NAME_LEN];
bool phys_addrs;
@@ -991,9 +989,6 @@ rte_eal_init(int argc, char **argv)
return -1;
}
 
-   p = strrchr(argv[0], '/');
-   strlcpy(logid, p ? p + 1 : argv[0], sizeof(logid));
-
eal_reset_internal_config(internal_conf);
 
/* set log level as early as possible */
@@ -1172,7 +1167,8 @@ rte_eal_init(int argc, char **argv)
 #endif
}
 
-   if (eal_log_init(logid, internal_conf->syslog_facility) < 0) {
+   if (eal_log_init(program_invocation_short_name,
+internal_conf->syslog_facility) < 0) {
rte_eal_init_alert("Cannot init logging.");
rte_errno = ENOMEM;
__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
-- 
2.39.2



RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path

2023-03-07 Thread Slava Ovsiienko
> -Original Message-
> From: Ruifeng Wang 
> Sent: четверг, 29 сентября 2022 г. 09:51
> To: Slava Ovsiienko ; Ali Alnubani
> ; Matan Azrad 
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> ; sta...@dpdk.org; nd ; nd
> 
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
> Hi Slava,
> 
> Are there any further comments?
> 
Hi,  Ruifeng

I've recalled the context and re-reviewed the patch.
There is no performance impact and  I'm run out of objections 😊
My sincere apologizes for the long delay ☹

Acked-by: Viacheslav Ovsiienko 



[PATCH 0/3] cryptodev test improvements

2023-03-07 Thread Ciara Power
The meson driver-tests suite was missing some QAT and SW PMD
crypto tests, which are now added.

Some issues were highlighted after adding to the driver-tests
suite, which are also addressed in this patchset.

Ciara Power (3):
  test/crypto: fix skip condition for CPU crypto SGL
  test/crypto: skip asym test if driver or device missing
  app/test: add more cryptodev tests to meson suite

 app/test/meson.build   |  5 +
 app/test/test_cryptodev.c  | 15 +--
 app/test/test_cryptodev_asym.c | 12 ++--
 3 files changed, 24 insertions(+), 8 deletions(-)

-- 
2.25.1



[PATCH 2/3] test/crypto: skip asym test if driver or device missing

2023-03-07 Thread Ciara Power
Asym crypto tests returned FAILED if the required driver wasn't loaded,
or no suitable device was found.

This is now updated to return SKIPPED in these cases, which better
reflects the status of the test, and follows the convention set in the
sym crypto tests in similar circumstances.

Signed-off-by: Ciara Power 
---
 app/test/test_cryptodev_asym.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/app/test/test_cryptodev_asym.c b/app/test/test_cryptodev_asym.c
index 5b16dcab56..9236817650 100644
--- a/app/test/test_cryptodev_asym.c
+++ b/app/test/test_cryptodev_asym.c
@@ -844,7 +844,7 @@ testsuite_setup(void)
valid_devs, RTE_CRYPTO_MAX_DEVS);
if (nb_devs < 1) {
RTE_LOG(ERR, USER1, "No crypto devices found?\n");
-   return TEST_FAILED;
+   return TEST_SKIPPED;
}
 
/*
@@ -2256,7 +2256,7 @@ test_cryptodev_openssl_asym(void)
 
if (gbl_driver_id == -1) {
RTE_LOG(ERR, USER1, "OPENSSL PMD must be loaded.\n");
-   return TEST_FAILED;
+   return TEST_SKIPPED;
}
 
return unit_test_suite_runner(&cryptodev_openssl_asym_testsuite);
@@ -2270,7 +2270,7 @@ test_cryptodev_qat_asym(void)
 
if (gbl_driver_id == -1) {
RTE_LOG(ERR, USER1, "QAT PMD must be loaded.\n");
-   return TEST_FAILED;
+   return TEST_SKIPPED;
}
 
return unit_test_suite_runner(&cryptodev_qat_asym_testsuite);
@@ -2283,7 +2283,7 @@ test_cryptodev_octeontx_asym(void)
RTE_STR(CRYPTODEV_NAME_OCTEONTX_SYM_PMD));
if (gbl_driver_id == -1) {
RTE_LOG(ERR, USER1, "OCTEONTX PMD must be loaded.\n");
-   return TEST_FAILED;
+   return TEST_SKIPPED;
}
return unit_test_suite_runner(&cryptodev_octeontx_asym_testsuite);
 }
@@ -2295,7 +2295,7 @@ test_cryptodev_cn9k_asym(void)
RTE_STR(CRYPTODEV_NAME_CN9K_PMD));
if (gbl_driver_id == -1) {
RTE_LOG(ERR, USER1, "CN9K PMD must be loaded.\n");
-   return TEST_FAILED;
+   return TEST_SKIPPED;
}
 
/* Use test suite registered for crypto_octeontx PMD */
@@ -2309,7 +2309,7 @@ test_cryptodev_cn10k_asym(void)
RTE_STR(CRYPTODEV_NAME_CN10K_PMD));
if (gbl_driver_id == -1) {
RTE_LOG(ERR, USER1, "CN10K PMD must be loaded.\n");
-   return TEST_FAILED;
+   return TEST_SKIPPED;
}
 
/* Use test suite registered for crypto_octeontx PMD */
-- 
2.25.1



[PATCH 1/3] test/crypto: fix skip condition for CPU crypto SGL

2023-03-07 Thread Ciara Power
When SGL support was added to AESNI_MB PMD, the feature flag was enabled.
This meant SGL testcases were incorrectly running for the
cryptodev_cpu_aesni_mb_autotest, and SGL support was not added for
CPU crypto.

Now skipping the ZUC auth cipher SGL tests for CPU crypto,
and GCM authenticated encryption SGL tests for CPU crypto on AESNI_MB
only, as AESNI_GCM CPU crypto supports inplace SGL.

Fixes: f9dfb59edbcc ("crypto/ipsec_mb: support remaining SGL")

Signed-off-by: Ciara Power 
---
 app/test/test_cryptodev.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index aa831d79a2..b7c01a1663 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -6537,6 +6537,9 @@ test_zuc_auth_cipher_sgl(const struct wireless_test_data 
*tdata,
tdata->digest.len) < 0)
return TEST_SKIPPED;
 
+   if (gbl_action_type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO)
+   return TEST_SKIPPED;
+
rte_cryptodev_info_get(ts_params->valid_devs[0], &dev_info);
 
uint64_t feat_flags = dev_info.feature_flags;
@@ -7896,6 +7899,9 @@ test_mixed_auth_cipher_sgl(const struct 
mixed_cipher_auth_test_data *tdata,
}
}
 
+   if (gbl_action_type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO)
+   return TEST_SKIPPED;
+
/* Create the session */
if (verify)
retval = create_wireless_algo_cipher_auth_session(
@@ -14719,8 +14725,13 @@ test_authenticated_encryption_SGL(const struct 
aead_test_data *tdata,
&cap_idx) == NULL)
return TEST_SKIPPED;
 
-   /* OOP not supported with CPU crypto */
-   if (oop && gbl_action_type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO)
+   /*
+* SGL not supported on AESNI_MB PMD CPU crypto,
+* OOP not supported on AESNI_GCM CPU crypto
+*/
+   if (gbl_action_type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO &&
+   (gbl_driver_id == rte_cryptodev_driver_id_get(
+   RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD)) || oop))
return TEST_SKIPPED;
 
/* Detailed check for the particular SGL support flag */
-- 
2.25.1



[PATCH 3/3] app/test: add more cryptodev tests to meson suite

2023-03-07 Thread Ciara Power
The meson driver-tests suite did not include some ipsec_mb SW Crypto
PMD tests, and QAT tests. These are now added to avoid them being missed
if the user runs tests only using the meson suite infrastructure.

Signed-off-by: Ciara Power 
---
 app/test/meson.build | 5 +
 1 file changed, 5 insertions(+)

diff --git a/app/test/meson.build b/app/test/meson.build
index f34d19e3c3..8c30442b62 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -300,13 +300,18 @@ perf_test_names = [
 driver_test_names = [
 'cryptodev_aesni_gcm_autotest',
 'cryptodev_aesni_mb_autotest',
+'cryptodev_chacha_poly_mb_autotest',
 'cryptodev_cn10k_autotest',
 'cryptodev_cn9k_autotest',
+'cryptodev_cpu_aesni_mb_autotest',
+'cryptodev_cpu_aesni_gcm_autotest',
 'cryptodev_dpaa2_sec_autotest',
 'cryptodev_dpaa_sec_autotest',
 'cryptodev_null_autotest',
 'cryptodev_openssl_autotest',
 'cryptodev_qat_autotest',
+'cryptodev_qat_asym_autotest',
+'cryptodev_qat_raw_api_autotest',
 'cryptodev_sw_armv8_autotest',
 'cryptodev_sw_kasumi_autotest',
 'cryptodev_sw_mvsam_autotest',
-- 
2.25.1



RE: [EXT] [PATCH 3/3] app/test: add more cryptodev tests to meson suite

2023-03-07 Thread Akhil Goyal
> Subject: [EXT] [PATCH 3/3] app/test: add more cryptodev tests to meson suite
> 
> The meson driver-tests suite did not include some ipsec_mb SW Crypto
> PMD tests, and QAT tests. These are now added to avoid them being missed
> if the user runs tests only using the meson suite infrastructure.
> 
> Signed-off-by: Ciara Power 
> ---
This patch is ok for now, but it is not scalable.
We should look for a way to consolidate all cryptodev autotests under one name.


RE: [EXT] [PATCH 3/3] app/test: add more cryptodev tests to meson suite

2023-03-07 Thread Power, Ciara
Hi Akhil,

> -Original Message-
> From: Akhil Goyal 
> Sent: Tuesday 7 March 2023 17:23
> To: Power, Ciara ; dev@dpdk.org
> Cc: Ji, Kai 
> Subject: RE: [EXT] [PATCH 3/3] app/test: add more cryptodev tests to meson
> suite
> 
> > Subject: [EXT] [PATCH 3/3] app/test: add more cryptodev tests to meson
> > suite
> >
> > The meson driver-tests suite did not include some ipsec_mb SW Crypto
> > PMD tests, and QAT tests. These are now added to avoid them being
> > missed if the user runs tests only using the meson suite infrastructure.
> >
> > Signed-off-by: Ciara Power 
> > ---
> This patch is ok for now, but it is not scalable.
> We should look for a way to consolidate all cryptodev autotests under one
> name.

I actually had started out to add all the missing cryptodev tests (bcmfs, 
caam_jr, ccp, nitrox etc.),
But the docs mention it's up to the maintainer/developer to decide whether the 
tests should be included in the meson test suite or not.
So, because I was unaware of the reasoning behind leaving them out, I just 
stuck to adding QAT + ipsec_mb ones in.

I guess, if we have them all under one name, it would remove that level of 
granularity for which tests are added.
Not sure if that is needed or not?

Thanks,
Ciara


RE: [PATCH 1/3] test/crypto: fix skip condition for CPU crypto SGL

2023-03-07 Thread Ji, Kai
Acked-by: Kai Ji 

> -Original Message-
> From: Power, Ciara 
> Sent: Tuesday, March 7, 2023 5:18 PM
> To: dev@dpdk.org
> Cc: Ji, Kai ; Power, Ciara ;
> Akhil Goyal ; Fan Zhang 
> Subject: [PATCH 1/3] test/crypto: fix skip condition for CPU crypto SGL
> 
> When SGL support was added to AESNI_MB PMD, the feature flag was
> enabled.
> This meant SGL testcases were incorrectly running for the
> cryptodev_cpu_aesni_mb_autotest, and SGL support was not added for CPU
> crypto.
> 
> Now skipping the ZUC auth cipher SGL tests for CPU crypto, and GCM
> authenticated encryption SGL tests for CPU crypto on AESNI_MB only, as
> AESNI_GCM CPU crypto supports inplace SGL.
> 
> Fixes: f9dfb59edbcc ("crypto/ipsec_mb: support remaining SGL")
> 
> Signed-off-by: Ciara Power 
> ---> 2.25.1



RE: [PATCH 0/3] cryptodev test improvements

2023-03-07 Thread Ji, Kai
Acked-by: Kai Ji 

> -Original Message-
> From: Power, Ciara 
> Sent: Tuesday, March 7, 2023 5:18 PM
> To: dev@dpdk.org
> Cc: Ji, Kai ; Power, Ciara 
> Subject: [PATCH 0/3] cryptodev test improvements
> 
> The meson driver-tests suite was missing some QAT and SW PMD crypto
> tests, which are now added.
> 
> Some issues were highlighted after adding to the driver-tests suite,
> which are also addressed in this patchset.
> 
> Ciara Power (3):
>   test/crypto: fix skip condition for CPU crypto SGL
>   test/crypto: skip asym test if driver or device missing
>   app/test: add more cryptodev tests to meson suite
> 
>  app/test/meson.build   |  5 +
>  app/test/test_cryptodev.c  | 15 +--
>  app/test/test_cryptodev_asym.c | 12 ++--
>  3 files changed, 24 insertions(+), 8 deletions(-)
> 
> --
> 2.25.1



RE: [PATCH 0/3] cryptodev test improvements

2023-03-07 Thread Ji, Kai
Series-acked-by: Kai Ji 

> -Original Message-
> From: Power, Ciara 
> Sent: Tuesday, March 7, 2023 5:18 PM
> To: dev@dpdk.org
> Cc: Ji, Kai ; Power, Ciara 
> Subject: [PATCH 0/3] cryptodev test improvements
> 
> The meson driver-tests suite was missing some QAT and SW PMD crypto
> tests, which are now added.
> 
> Some issues were highlighted after adding to the driver-tests suite,
> which are also addressed in this patchset.
> 
> Ciara Power (3):
>   test/crypto: fix skip condition for CPU crypto SGL
>   test/crypto: skip asym test if driver or device missing
>   app/test: add more cryptodev tests to meson suite
> 
>  app/test/meson.build   |  5 +
>  app/test/test_cryptodev.c  | 15 +--
>  app/test/test_cryptodev_asym.c | 12 ++--
>  3 files changed, 24 insertions(+), 8 deletions(-)
> 
> --
> 2.25.1



RE: [EXT] [PATCH 3/3] app/test: add more cryptodev tests to meson suite

2023-03-07 Thread Akhil Goyal
Hi Ciara,
> Hi Akhil,
> 
> > -Original Message-
> > From: Akhil Goyal 
> > Sent: Tuesday 7 March 2023 17:23
> > To: Power, Ciara ; dev@dpdk.org
> > Cc: Ji, Kai 
> > Subject: RE: [EXT] [PATCH 3/3] app/test: add more cryptodev tests to meson
> > suite
> >
> > > Subject: [EXT] [PATCH 3/3] app/test: add more cryptodev tests to meson
> > > suite
> > >
> > > The meson driver-tests suite did not include some ipsec_mb SW Crypto
> > > PMD tests, and QAT tests. These are now added to avoid them being
> > > missed if the user runs tests only using the meson suite infrastructure.
> > >
> > > Signed-off-by: Ciara Power 
> > > ---
> > This patch is ok for now, but it is not scalable.
> > We should look for a way to consolidate all cryptodev autotests under one
> > name.
> 
> I actually had started out to add all the missing cryptodev tests (bcmfs, 
> caam_jr,
> ccp, nitrox etc.),
> But the docs mention it's up to the maintainer/developer to decide whether the
> tests should be included in the meson test suite or not.
> So, because I was unaware of the reasoning behind leaving them out, I just 
> stuck
> to adding QAT + ipsec_mb ones in.
> 
> I guess, if we have them all under one name, it would remove that level of
> granularity for which tests are added.
> Not sure if that is needed or not?
> 
>From the past some time, we have added a lot of capability checks in the test 
>cases
So that the same case may be run on all the PMDs.

I believe it should be doable at some point going forward.
This will help in reducing the unnecessary bloating of the code.
We should put in some effort to make it similar to other device test cases.
PMD specific suites are only in case of cryptodev. This should be fixed.
Moreover, it would be beneficial for all the PMDs to be tested at similar levels
Of granularity.

Regards,
Akhil


Re: [PATCH v2] eal: fix thread race in control thread creation

2023-03-07 Thread Tyler Retzlaff
On Tue, Mar 07, 2023 at 02:53:34PM +0100, David Marchand wrote:
> On Thu, Mar 2, 2023 at 5:18 AM Honnappa Nagarahalli
>  wrote:
> > > > >   if (ctrl_thread_init(arg) != 0)
> > > > >   return NULL;
> > > > >
> > > > > - return start_routine(params->arg);
> > > > > + return start_routine(start_arg);
> > > > We can free 'params' here after 'start_routine' returns.
> > >
> > > I guess it doesn't matter if the allocation is retained for the duration 
> > > of
> > > start_routine() which could be ~long.
> > Yes, that's what I thought, it is a small size.
> >
> > >
> > > David/Honnappah let me know what you decide. if you'd prefer to change to
> > > honnappah's suggestion i'll put a new version up.
> 
> Hum, how would it look like?
> - the parent thread would free params if the child thread fails to set 
> affinity,
> - the child thread would free params otherwise,
> 
> Is this what you propose?
> 
> This works, but we have to be extra careful if any change is done in
> this part later.

i think the fix i've already put up is kind of trivially verifiable as
correct, if we just need a fix then it works.

> 
> 
> -- 
> David Marchand


ixgbe rxq interrupt not working

2023-03-07 Thread Rajasekhar Pulluru
Hi Team,

Bringing-up dpdk-22.07 on an intel machine with 8 ports, 4 of them driven
by igb and the rest of the 4 ports driven by ixgbe.

I am following the below sequence to initialize these ports:

dev_conf.intr_conf.lsc = 1; //Enable link state change interrupt
dev_conf.intr_conf.rxq = 1; //Enable RX Queue Interrupt
dev_conf.rxmode.mq_mode = RTE_ETH_MQ_RX_NONE;
dev_conf.rxmode.offloads = 0;
dev_conf.txmode.mq_mode = RTE_ETH_MQ_TX_NONE;
dev_conf.txmode.offloads = 0;

rte_eth_dev_configure
rte_eth_rx_queue_setup
rte_eth_tx_queue_setup
rte_eth_dev_start
data = port_id << CHAR_BIT | queue_id;
rte_eth_dev_rx_intr_ctl_q(port_id, queue_id, RTE_EPOLL_PER_THREAD,
RTE_INTR_EVENT_ADD, (void *)((uintptr_t)data));
rte_eth_dev_rx_intr_enable(port_id, queue_id);

And then main loop repeats the below:

rte_epoll_wait(RTE_EPOLL_PER_THREAD, event, 1, timeout /* 200micro-sec */);
/* ignore return value */
rte_eth_dev_rx_intr_disable(port_id, queue_id);
rte_eth_rx_burst(port_id, queue_id, pkts, num_pkts);
rte_eth_dev_rx_intr_enable(port_id, queue_id);

The code is same for all the ports, igb ports are able to come-up and rx
packets, where-as the ixgbe ports are not able to rx packets at all.
cat /proc/interrupts dumps vfio-msix counters for ixgbe as 0, where-as it's
non-zero for igb.

If I don't use/enable rxq interrupt for ixgbe (and remove epoll wait,
interrupt enable/disable from while loop) and simply poll for
rte_eth_rx_burst in a loop, ixgbe ports are able to rx packets.

What could be wrong here? Appreciate any help.

I would also like to know if there's an asynchronous rxq interrupt
notification to the application instead of rte_epoll_wait (and sleep).

Thanks & Regards,
Rajasekhar


[PATCH] net/i40e: avx512 fast-free path bug fix

2023-03-07 Thread Kamalakshitha Aligeri
In i40e_tx_free_bufs_avx512 fast-free path, when cache is NULL,
non fast-free path is being executed. Fixed the bug by calling
rte_mempool_generic_put API that handles the cache==NULL case.

Fixes: 5171b4ee6b6b ("net/i40e: optimize Tx by using AVX512")
Cc: leyi.r...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Kamalakshitha Aligeri 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Feifei Wang 
---
 .mailmap|  1 +
 drivers/net/i40e/i40e_rxtx_vec_avx512.c | 12 
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/.mailmap b/.mailmap
index a9f4f28fba..2581d0efe7 100644
--- a/.mailmap
+++ b/.mailmap
@@ -677,6 +677,7 @@ Kai Ji 
 Kaiwen Deng 
 Kalesh AP 
 Kamalakannan R 
+Kamalakshitha Aligeri 
 Kamil Bednarczyk 
 Kamil Chalupnik 
 Kamil Rytarowski 
diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c 
b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
index d3c7bfd121..ad0893324d 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
@@ -783,16 +783,13 @@ i40e_tx_free_bufs_avx512(struct i40e_tx_queue *txq)
struct rte_mempool_cache *cache = rte_mempool_default_cache(mp,
rte_lcore_id());

-   if (!cache || cache->len == 0)
-   goto normal;
-
-   cache_objs = &cache->objs[cache->len];
-
-   if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
-   rte_mempool_ops_enqueue_bulk(mp, (void *)txep, n);
+   if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
+   rte_mempool_generic_put(mp, (void *)txep, n, cache);
goto done;
}

+   cache_objs = &cache->objs[cache->len];
+
/* The cache follows the following algorithm
 *   1. Add the objects to the cache
 *   2. Anything greater than the cache min value (if it
@@ -824,7 +821,6 @@ i40e_tx_free_bufs_avx512(struct i40e_tx_queue *txq)
goto done;
}

-normal:
m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
if (likely(m)) {
free[0] = m;
--
2.25.1



RE: [PATCH] net/i40e: avx512 fast-free path bug fix

2023-03-07 Thread Morten Brørup
> From: Kamalakshitha Aligeri [mailto:kamalakshitha.alig...@arm.com]
> Sent: Tuesday, 7 March 2023 20.32
> 
> In i40e_tx_free_bufs_avx512 fast-free path, when cache is NULL,
> non fast-free path is being executed. Fixed the bug by calling
> rte_mempool_generic_put API that handles the cache==NULL case.
> 
> Fixes: 5171b4ee6b6b ("net/i40e: optimize Tx by using AVX512")
> Cc: leyi.r...@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Kamalakshitha Aligeri 
> Reviewed-by: Ruifeng Wang 
> Reviewed-by: Feifei Wang 
> ---
>  .mailmap|  1 +
>  drivers/net/i40e/i40e_rxtx_vec_avx512.c | 12 
>  2 files changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/.mailmap b/.mailmap
> index a9f4f28fba..2581d0efe7 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -677,6 +677,7 @@ Kai Ji 
>  Kaiwen Deng 
>  Kalesh AP 
>  Kamalakannan R 
> +Kamalakshitha Aligeri 
>  Kamil Bednarczyk 
>  Kamil Chalupnik 
>  Kamil Rytarowski 
> diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c
> b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
> index d3c7bfd121..ad0893324d 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c
> +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
> @@ -783,16 +783,13 @@ i40e_tx_free_bufs_avx512(struct i40e_tx_queue
> *txq)
>   struct rte_mempool_cache *cache =
> rte_mempool_default_cache(mp,
>   rte_lcore_id());
> 
> - if (!cache || cache->len == 0)
> - goto normal;
> -
> - cache_objs = &cache->objs[cache->len];
> -
> - if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> - rte_mempool_ops_enqueue_bulk(mp, (void *)txep, n);
> + if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> + rte_mempool_generic_put(mp, (void *)txep, n, cache);
>   goto done;
>   }
> 
> + cache_objs = &cache->objs[cache->len];
> +
>   /* The cache follows the following algorithm
>*   1. Add the objects to the cache
>*   2. Anything greater than the cache min value (if it
> @@ -824,7 +821,6 @@ i40e_tx_free_bufs_avx512(struct i40e_tx_queue *txq)
>   goto done;
>   }
> 
> -normal:
>   m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
>   if (likely(m)) {
>   free[0] = m;
> --
> 2.25.1
> 

An improvement of the copy-paste code we are aiming to replace by proper use of 
the mempool API.

But still an improvement.

Acked-by: Morten Brørup 



Re: [PATCH v3 1/3] ethdev: enable direct rearm with separate API

2023-03-07 Thread Ferruh Yigit
On 3/7/2023 6:12 AM, Honnappa Nagarahalli wrote:
> 
> 
>>
>> On 3/6/2023 1:26 PM, Morten Brørup wrote:
 From: Ferruh Yigit [mailto:ferruh.yi...@amd.com]
 Sent: Monday, 6 March 2023 13.49

 On 1/4/2023 8:21 AM, Morten Brørup wrote:
>> From: Feifei Wang [mailto:feifei.wa...@arm.com]
>> Sent: Wednesday, 4 January 2023 08.31
>>
>> Add 'tx_fill_sw_ring' and 'rx_flush_descriptor' API into direct
>> rearm mode for separate Rx and Tx Operation. And this can support
>> different multiple sources in direct rearm mode. For examples, Rx
>> driver is ixgbe, and Tx driver is i40e.
>>
>> Suggested-by: Honnappa Nagarahalli 
>> Suggested-by: Ruifeng Wang 
>> Signed-off-by: Feifei Wang 
>> Reviewed-by: Ruifeng Wang 
>> Reviewed-by: Honnappa Nagarahalli 
>> ---
>
> This feature looks very promising for performance. I am pleased to
> see
 progress on it.
>

 Hi Morten,

 Yes it brings some performance, but not to generic use case, only to
 specific and constraint use case.
>>>
>>> I got the impression that the supported use case is a prominent and 
>>> important
>> use case.
>>>
>>
>> Can you please give real life samples for this use case, other than just 
>> showing
>> better performance number in the test bench? This helps to understand the
>> reasoning better.
> The very first patch started off with a constrained but prominent use case. 
> Though, DPU based PCIe cards running DPDK applications with 1 or max 2 ports 
> being used in tons of data centers is not a secret anymore and not a small 
> use case that can be ignored.
> However, the design of the patch has changed significantly from then. Now the 
> solution can be applied to any generic use case that uses run-to-completion 
> model of DPDK. i.e. the mapping of the RX and TX ports can be done 
> dynamically in the data plane threads. There is no need of static 
> configuration from control plane.
> 
> On the test bench, we need to make up our mind. When we see improvements, we 
> say it is just a test bench. On other occasions when the test bench does not 
> show any improvements (but improvements are shown by other metrics), we say 
> the test bench does not show any improvements.
> 
>>
>>> This is the primary argument for considering such a complex non-generic
>> feature.
> I am not sure what is the complexity here, can you please elaborate?

I am considering from user perspective.

OK, DPDK is already low level, but ethdev has only a handful of datapath
APIs (6 of them), and main ones are easy to comprehend:
rte_eth_rx_burst(port_id, queue_id, rx_pkts, nb_pkts);
rte_eth_tx_burst(port_id, queue_id, tx_pkts, nb_pkts);

They (magically) Rx/Tx buffers, easy to grasp.

Maybe rte_eth_tx_prepare() is a little less obvious (why/when to use
it), but still I believe simple.

Whoever looks to these APIs can figure out how to use in the application.

The other three is related to the descriptors and I am not sure about
their use-case, I assume they are mostly good for debugging.


But now we are adding new datapath APIs:
rte_eth_tx_fill_sw_ring(port_id, queue_id, rxq_rearm_data);
rte_eth_rx_flush_descriptor(port_id, queue_id, nb_rearm);

When you talk about SW ring and re-arming descriptors I believe you will
loose most of the users already, driver developers will know what it is,
you will know what that is, but people who are not close to the Ethernet
HW won't.

And these APIs will be very visible, not like one of many control plane
dev_ops. So this can confuse users who are not familiar with details.

Usage of these APIs comes with restrictions, it is possible that at some
percentage of users will miss these restrictions or miss-understand them
and will have issues.

Or many may be intimidated by them and stay away from using these APIs,
leaving them as a burden to maintain, to test, to fix. That is why I
think a real life usecase is needed, in that case at least we will know
some consumers will fix or let us know when they get broken.

It may be possible to hide details under driver and user only set an
offload flag, similar to FAST_FREE, but in that case feature will loose
flexibility and it will be even more specific, perhaps making it less
useful.


> I see other patches/designs (ex: proactive error recovery) which are way more 
> complex to understand and comprehend.
> 
>>>

 And changes are relatively invasive comparing the usecase it
 supports, like it adds new two inline datapath functions and a new dev_ops.

 I am worried the unnecessary complexity and possible regressions in
 the fundamental and simple parts of the project, with a good
 intention to gain a few percentage performance in a specific usecase,
 can hurt the project.
> I agree that we are touching some fundamental parts of the project. But, we 
> also need to realize that those fundamental parts were not developed on 
> architectures that have joined the projec

Re: ixgbe rxq interrupt not working

2023-03-07 Thread Stephen Hemminger
On Wed, 8 Mar 2023 00:22:10 +0530
Rajasekhar Pulluru  wrote:

> Hi Team,
> 
> Bringing-up dpdk-22.07 on an intel machine with 8 ports, 4 of them driven
> by igb and the rest of the 4 ports driven by ixgbe.


FYI - 22.07 is not a supported release, only LTS releases like 22.11 are 
supported.


RE: ixgbe rxq interrupt not working

2023-03-07 Thread Honnappa Nagarahalli


From: Rajasekhar Pulluru  
Sent: Tuesday, March 7, 2023 12:52 PM
To: dev@dpdk.org
Subject: ixgbe rxq interrupt not working

Hi Team,

Bringing-up dpdk-22.07 on an intel machine with 8 ports, 4 of them driven by 
igb and the rest of the 4 ports driven by ixgbe. 
[Honnappa]  Do you have packets crossing between the 2 drivers?


I am following the below sequence to initialize these ports:

dev_conf.intr_conf.lsc = 1; //Enable link state change interrupt
dev_conf.intr_conf.rxq = 1; //Enable RX Queue Interrupt
dev_conf.rxmode.mq_mode = RTE_ETH_MQ_RX_NONE;
dev_conf.rxmode.offloads = 0;
dev_conf.txmode.mq_mode = RTE_ETH_MQ_TX_NONE;
dev_conf.txmode.offloads = 0;

rte_eth_dev_configure
rte_eth_rx_queue_setup
rte_eth_tx_queue_setup
rte_eth_dev_start
data = port_id << CHAR_BIT | queue_id;
rte_eth_dev_rx_intr_ctl_q(port_id, queue_id, RTE_EPOLL_PER_THREAD, 
RTE_INTR_EVENT_ADD, (void *)((uintptr_t)data));
rte_eth_dev_rx_intr_enable(port_id, queue_id);

And then main loop repeats the below:

rte_epoll_wait(RTE_EPOLL_PER_THREAD, event, 1, timeout /* 200micro-sec */); /* 
ignore return value */
rte_eth_dev_rx_intr_disable(port_id, queue_id);
rte_eth_rx_burst(port_id, queue_id, pkts, num_pkts);
rte_eth_dev_rx_intr_enable(port_id, queue_id);

The code is same for all the ports, igb ports are able to come-up and rx 
packets, where-as the ixgbe ports are not able to rx packets at all.
cat /proc/interrupts dumps vfio-msix counters for ixgbe as 0, where-as it's 
non-zero for igb. 
If I don't use/enable rxq interrupt for ixgbe (and remove epoll wait, interrupt 
enable/disable from while loop) and simply poll for rte_eth_rx_burst in a loop, 
ixgbe ports are able to rx packets.

What could be wrong here? Appreciate any help.

I would also like to know if there's an asynchronous rxq interrupt notification 
to the application instead of rte_epoll_wait (and sleep).

Thanks & Regards,
Rajasekhar


Re: [PATCH v3 2/2] eal: add option to put timestamp on console output

2023-03-07 Thread fengchengwen



On 2023/3/8 0:06, Stephen Hemminger wrote:
> On Tue, 7 Mar 2023 17:35:32 +0800
> fengchengwen  wrote:
> 
>> The syslog will add timestamp, but the syslog backend will re-write 
>> timestamp, so
>> in the last, you can't find the real-timestamp of this log print. sometimes 
>> it requires
>> to get real log time.
>> PS: we found it in our test environment because RR schedule hang too long 
>> (similar question
>> also found: https://bugzilla.redhat.com/show_bug.cgi?id=1855447).
>>
>> So suggest add timestamp in syslog string also, and don't convert to 
>> monotonic and just
>> print as normal format (just like syslog).
> 
> 
> Are you using systemd?

Yes

> 
> Never, never configure a DPDK application with real-time process priority.
> Polling model and RT don't mix.

Maybe we should document them ?

> .
> 


RE: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode

2023-03-07 Thread Honnappa Nagarahalli


> >
> >>>
> >>> Is there any reason not to design this in the same way as
> >> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
> >>
> >> I suppose it is a question for the authors of original patch...
> > Appreciate if the authors could comment on this.
> 
> The main cause is that the hardware implementation limit, I will try to 
> explain
> from hns3 PMD's view.
> For a global reset, all the function need responsed within a centain period of
> time. otherwise, the reset will fail. and also the reset requirement a few 
> steps (all
> may take a long time).
> 
> When with multiple functions in one DPDK, and trigger a global reset, the
> rte_eth_dev_reset will not cover this scene:
> 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt thread.
> 2. then invoke application callback, but due to the same thread, and each
> port's recover will take a long time, so later port will reset failed.
If the design were to introduce RTE_ETH_EVENT_INTR_RECOVER and 
rte_eth_dev_recover, what problems do you see?

> 
> >
> >>
> >>> We could have a similar API 'rte_eth_dev_recover' to do the recovery
> >> functionality.
> >>
> >> I suppose such approach is also possible.
> >> Personally I am fine with both ways: either existing one or what you
> >> propose, as long as we'll fix existing race-condition.
> >> What is good with what you suggest - that way we probably don't need
> >> to worry how to allow user to enable/disable auto-recovery inside PMD.
> >>
> >> Konstantin
> >>
> >


Re: [PATCH v3 2/2] eal: add option to put timestamp on console output

2023-03-07 Thread Stephen Hemminger
On Wed, 8 Mar 2023 08:36:48 +0800
fengchengwen  wrote:

> On 2023/3/8 0:06, Stephen Hemminger wrote:
> > On Tue, 7 Mar 2023 17:35:32 +0800
> > fengchengwen  wrote:
> >   
> >> The syslog will add timestamp, but the syslog backend will re-write 
> >> timestamp, so
> >> in the last, you can't find the real-timestamp of this log print. 
> >> sometimes it requires
> >> to get real log time.
> >> PS: we found it in our test environment because RR schedule hang too long 
> >> (similar question
> >> also found: https://bugzilla.redhat.com/show_bug.cgi?id=1855447).
> >>
> >> So suggest add timestamp in syslog string also, and don't convert to 
> >> monotonic and just
> >> print as normal format (just like syslog).  
> > 
> > 
> > Are you using systemd?  
> 
> Yes

There is redhat bug about this:
https://bugzilla.redhat.com/show_bug.cgi?id=991678

> > Never, never configure a DPDK application with real-time process priority.
> > Polling model and RT don't mix.  
> 
> Maybe we should document them ?

Part of previous discussion here:
https://mails.dpdk.org/archives/dev/2021-April/203778.html

In my experience running DPDK on isolated threads (cgroup or scheduler 
isolation)
combined with remapping interrupts gives best response without lockup.

I.e don't depend on scheduler to do the right thing. Instead ensure that
each thread runs on dedicated CPU.





RE: [PATCH v3] app/testpmd: fix secondary process not forwarding

2023-03-07 Thread He, ShiyangX


>-Original Message-
>From: Ferruh Yigit 
>Sent: Tuesday, March 7, 2023 7:41 PM
>To: He, ShiyangX ; dev@dpdk.org
>Cc: Zhou, YidingX ; sta...@dpdk.org; Zhang, Yuying
>; Singh, Aman Deep
>; Burakov, Anatoly
>; Matan Azrad ; Dmitry
>Kozlyuk 
>Subject: Re: [PATCH v3] app/testpmd: fix secondary process not forwarding
>
>On 3/7/2023 3:25 AM, He, ShiyangX wrote:
>>
>>
>>> -Original Message-
>>> From: Ferruh Yigit 
>>> Sent: Monday, March 6, 2023 11:06 PM
>>> To: He, ShiyangX ; dev@dpdk.org
>>> Cc: Zhou, YidingX ; sta...@dpdk.org; Zhang,
>>> Yuying ; Singh, Aman Deep
>>> ; Burakov, Anatoly
>>> ; Matan Azrad ; Dmitry
>>> Kozlyuk 
>>> Subject: Re: [PATCH v3] app/testpmd: fix secondary process not
>>> forwarding
>>>
>>> On 2/23/2023 2:41 PM, Shiyang He wrote:
 Under multi-process scenario, the secondary process gets queue state
 from the wrong location (the global variable 'ports'). Therefore,
 the secondary process can not forward since "stream_init" is not called.

 This commit fixes the issue by calling 'rte_eth_rx/tx_queue_info_get'
 to get queue state from shared memory.

 Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
 Cc: sta...@dpdk.org

 Signed-off-by: Shiyang He 
 Acked-by: Yuying Zhang 

 v3: Add return value description
 ---
  app/test-pmd/testpmd.c | 45
 --
  1 file changed, 43 insertions(+), 2 deletions(-)

 diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
 0c14325b8d..a050472aea 100644
 --- a/app/test-pmd/testpmd.c
 +++ b/app/test-pmd/testpmd.c
 @@ -2418,9 +2418,50 @@ start_packet_forwarding(int with_tx_first)
if (!pkt_fwd_shared_rxq_check())
return;

 -  if (stream_init != NULL)
 -  for (i = 0; i < cur_fwd_config.nb_fwd_streams; i++)
 +  if (stream_init != NULL) {
 +  for (i = 0; i < cur_fwd_config.nb_fwd_streams; i++) {
 +  if (rte_eal_process_type() == RTE_PROC_SECONDARY)
>>> {
 +  struct fwd_stream *fs = fwd_streams[i];
 +  struct rte_eth_rxq_info rx_qinfo;
 +  struct rte_eth_txq_info tx_qinfo;
 +  int32_t rc;
 +  rc = rte_eth_rx_queue_info_get(fs->rx_port,
 +  fs->rx_queue, &rx_qinfo);
 +  if (rc == 0) {
 +  ports[fs->rx_port].rxq[fs-
 rx_queue].state =
 +  rx_qinfo.queue_state;
 +  } else if (rc == -ENOTSUP) {
 +  /* Set the rxq state to
>>> RTE_ETH_QUEUE_STATE_STARTED
 +   * to ensure that the PMDs do not
>>> implement
 +   * rte_eth_rx_queue_info_get can
>>> forward.
 +   */
 +  ports[fs->rx_port].rxq[fs-
 rx_queue].state =
 +
>>> RTE_ETH_QUEUE_STATE_STARTED;
 +  } else {
 +  TESTPMD_LOG(WARNING,
 +  "Failed to get rx queue
>>> info\n");
 +  }
 +
 +  rc = rte_eth_tx_queue_info_get(fs->tx_port,
 +  fs->tx_queue, &tx_qinfo);
 +  if (rc == 0) {
 +  ports[fs->tx_port].txq[fs-
 tx_queue].state =
 +  tx_qinfo.queue_state;
 +  } else if (rc == -ENOTSUP) {
 +  /* Set the txq state to
>>> RTE_ETH_QUEUE_STATE_STARTED
 +   * to ensure that the PMDs do not
>>> implement
 +   * rte_eth_tx_queue_info_get can
>>> forward.
 +   */
 +  ports[fs->tx_port].txq[fs-
 tx_queue].state =
 +
>>> RTE_ETH_QUEUE_STATE_STARTED;
 +  } else {
 +  TESTPMD_LOG(WARNING,
 +  "Failed to get tx queue
>>> info\n");
 +  }
 +  }
stream_init(fwd_streams[i]);
 +  }
 +  }

>>>
>>>
>>> Testpmd duplicates some dpdk/ethdev state/config in application
>>> level, and this can bite in multiple cases, as it is happening here.
>>>
>>> I am not sure if this was a design decision, but I think instead of
>>> testpmd storing ethdev related state/config in application level, it
>>> should store only application level state/config, and when ethdev
>>> 

[PATCH] net/nfp: fix MTU configuration order

2023-03-07 Thread Chaoyong He
From: Peng Zhang 

If rte_eth_dev_set_mtu() is called before rte_eth_rx_queue_setup() the
NFP driver setup fails. This is because the default values evaluated
when setting the MTU are initialized in the rte_eth_rx_queue_setup()
code path. Fix this by instead initializing the MTU default values in
the device initialization, in nfp_net_init() and the check also is
conducted in nfp_net_start(), so it doesn't influence the result.

This was found by using DPDK with OVS.

Fixes: dbad6f64f921 ("net/nfp: fix internal buffer size and MTU check")
Cc: sta...@dpdk.org

Signed-off-by: Peng Zhang 
Reviewed-by: Chaoyong He 
Reviewed-by: Niklas Söderlund 
---
 drivers/net/nfp/flower/nfp_flower.c | 1 -
 drivers/net/nfp/nfp_common.c| 4 ++--
 drivers/net/nfp/nfp_common.h| 1 +
 drivers/net/nfp/nfp_ethdev.c| 1 +
 drivers/net/nfp/nfp_ethdev_vf.c | 1 +
 5 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/nfp/flower/nfp_flower.c 
b/drivers/net/nfp/flower/nfp_flower.c
index 2c797ae751..6f197396a4 100644
--- a/drivers/net/nfp/flower/nfp_flower.c
+++ b/drivers/net/nfp/flower/nfp_flower.c
@@ -24,7 +24,6 @@
 #include "nfp_flower_cmsg.h"
 
 #define CTRL_VNIC_NB_DESC 512
-#define DEFAULT_FLBUF_SIZE 9216
 
 static void
 nfp_pf_repr_enable_queues(struct rte_eth_dev *dev)
diff --git a/drivers/net/nfp/nfp_common.c b/drivers/net/nfp/nfp_common.c
index 5922bfea8e..5d92b476e2 100644
--- a/drivers/net/nfp/nfp_common.c
+++ b/drivers/net/nfp/nfp_common.c
@@ -1126,9 +1126,9 @@ nfp_net_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
return -EBUSY;
}
 
-   /* MTU larger then current mbufsize not supported */
+   /* MTU larger than current mbufsize not supported */
if (mtu > hw->flbufsz) {
-   PMD_DRV_LOG(ERR, "MTU (%u) larger then current mbufsize (%u) 
not supported",
+   PMD_DRV_LOG(ERR, "MTU (%u) larger than current mbufsize (%u) 
not supported",
mtu, hw->flbufsz);
return -ERANGE;
}
diff --git a/drivers/net/nfp/nfp_common.h b/drivers/net/nfp/nfp_common.h
index 49c89ac327..4486ffa72c 100644
--- a/drivers/net/nfp/nfp_common.h
+++ b/drivers/net/nfp/nfp_common.h
@@ -111,6 +111,7 @@ struct nfp_net_adapter;
 
 /* Maximum supported NFP frame size (MTU + layer 2 headers) */
 #define NFP_FRAME_SIZE_MAX 10048
+#define DEFAULT_FLBUF_SIZE9216
 
 #include 
 #include 
diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index 47d5dff16c..56fb8e8c73 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -603,6 +603,7 @@ nfp_net_init(struct rte_eth_dev *eth_dev)
hw->cap = nn_cfg_readl(hw, NFP_NET_CFG_CAP);
hw->max_mtu = nn_cfg_readl(hw, NFP_NET_CFG_MAX_MTU);
hw->mtu = RTE_ETHER_MTU;
+   hw->flbufsz = DEFAULT_FLBUF_SIZE;
 
/* VLAN insertion is incompatible with LSOv2 */
if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c
index 7834b2ee0c..d69ac8cd37 100644
--- a/drivers/net/nfp/nfp_ethdev_vf.c
+++ b/drivers/net/nfp/nfp_ethdev_vf.c
@@ -365,6 +365,7 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
hw->cap = nn_cfg_readl(hw, NFP_NET_CFG_CAP);
hw->max_mtu = nn_cfg_readl(hw, NFP_NET_CFG_MAX_MTU);
hw->mtu = RTE_ETHER_MTU;
+   hw->flbufsz = DEFAULT_FLBUF_SIZE;
 
/* VLAN insertion is incompatible with LSOv2 */
if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
-- 
2.39.1



Reminder - DPDK Tech Board Call - Tomorrow Wed. 3/8 at 7am PT/10am ET/1600h CET

2023-03-07 Thread Nathan Southern
Good evening,

Tomorrow, Wed. Mar. 8, 2023, the DPDK Project will hold its biweekly tech
board meeting, at 7am PT/10am ET/1600h CET.

You may connect to the meeting here: https://meet.jit.si/dpdk


And our agenda will be posted here:
*https://annuel.framapad.org/p/r.0c3cc4d1e011214183872a98f6b5c7db
*

We look forward to seeing you tomorrow

Thanks much,

Nathan

Nathan C. Southern, Project Coordinator

Data Plane Development Kit

The Linux Foundation

248.835.4812 (mobile)

nsouth...@linuxfoundation.org


Re: [PATCH] net/nfp: fix MTU configuration order

2023-03-07 Thread Stephen Hemminger
On Wed,  8 Mar 2023 10:33:18 +0800
Chaoyong He  wrote:

> diff --git a/drivers/net/nfp/nfp_common.c b/drivers/net/nfp/nfp_common.c
> index 5922bfea8e..5d92b476e2 100644
> --- a/drivers/net/nfp/nfp_common.c
> +++ b/drivers/net/nfp/nfp_common.c
> @@ -1126,9 +1126,9 @@ nfp_net_dev_mtu_set(struct rte_eth_dev *dev, uint16_t 
> mtu)
>   return -EBUSY;
>   }
>  
> - /* MTU larger then current mbufsize not supported */
> + /* MTU larger than current mbufsize not supported */
>   if (mtu > hw->flbufsz) {
> - PMD_DRV_LOG(ERR, "MTU (%u) larger then current mbufsize (%u) 
> not supported",
> + PMD_DRV_LOG(ERR, "MTU (%u) larger than current mbufsize (%u) 
> not supported",
>   mtu, hw->flbufsz);
>   return -ERANGE;
>   }

Patch looks good but this looks like unrelated whitespace change.


Re: [PATCH v3 2/2] eal: add option to put timestamp on console output

2023-03-07 Thread Stephen Hemminger
On Wed, 8 Mar 2023 08:36:48 +0800
fengchengwen  wrote:

> > Never, never configure a DPDK application with real-time process priority.
> > Polling model and RT don't mix.  
> 
> Maybe we should document them ?


LOL - ChatGPT is amusing if you ask it. The answer is pro grammatically
correct, but will not work. The caveat at the end is much bigger than it
understands.

Question:

Is Dataplane Development Kit (DPDK) compatible with Linux Real Time thread 
priority like SCH_FIFO? 
Yes, Dataplane Development Kit (DPDK) is compatible with Linux Real-Time thread 
priority scheduling policies, such as SCHED_FIFO.

Answer:

DPDK supports setting the thread priority using the standard Linux pthread 
APIs, including pthread_setschedparam() and pthread_attr_setschedpolicy(). This 
allows the application to configure the thread scheduling policy and priority 
for DPDK worker threads.

To set the real-time priority of DPDK threads, you can use the 
pthread_setschedparam() function to set the scheduling policy to SCHED_FIFO or 
SCHED_RR and the priority value to a value between 1 and 99. Here is an example 
of how to set the real-time priority of a DPDK thread:

> pthread_t tid;
> struct sched_param sched_param;
> pthread_attr_t attr;
> 
> pthread_attr_init(&attr);
> pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
> 
> sched_param.sched_priority = 50;
> pthread_attr_setschedparam(&attr, &sched_param);
> pthread_attr_setschedpolicy(&attr, SCHED_FIFO);
> 
> pthread_create(&tid, &attr, thread_func, arg);


In this example, the SCHED_FIFO scheduling policy is used, and the priority 
value is set to 50. You can adjust the priority value according to your needs.

Note that using real-time scheduling policies like SCHED_FIFO or SCHED_RR 
requires careful consideration of system resources and may impact system 
performance if not used properly. It is recommended to consult the Linux 
documentation and to test the system thoroughly before deploying any 
application using real-time scheduling policies.


Re: [PATCH v3] app/testpmd: fix secondary process not forwarding

2023-03-07 Thread lihuisong (C)



在 2023/3/8 10:05, He, ShiyangX 写道:



-Original Message-
From: Ferruh Yigit 
Sent: Tuesday, March 7, 2023 7:41 PM
To: He, ShiyangX ; dev@dpdk.org
Cc: Zhou, YidingX ; sta...@dpdk.org; Zhang, Yuying
; Singh, Aman Deep
; Burakov, Anatoly
; Matan Azrad ; Dmitry
Kozlyuk 
Subject: Re: [PATCH v3] app/testpmd: fix secondary process not forwarding

On 3/7/2023 3:25 AM, He, ShiyangX wrote:



-Original Message-
From: Ferruh Yigit 
Sent: Monday, March 6, 2023 11:06 PM
To: He, ShiyangX ; dev@dpdk.org
Cc: Zhou, YidingX ; sta...@dpdk.org; Zhang,
Yuying ; Singh, Aman Deep
; Burakov, Anatoly
; Matan Azrad ; Dmitry
Kozlyuk 
Subject: Re: [PATCH v3] app/testpmd: fix secondary process not
forwarding

On 2/23/2023 2:41 PM, Shiyang He wrote:

Under multi-process scenario, the secondary process gets queue state
from the wrong location (the global variable 'ports'). Therefore,
the secondary process can not forward since "stream_init" is not called.

This commit fixes the issue by calling 'rte_eth_rx/tx_queue_info_get'
to get queue state from shared memory.

Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
Cc: sta...@dpdk.org

Signed-off-by: Shiyang He 
Acked-by: Yuying Zhang 

v3: Add return value description
---
  app/test-pmd/testpmd.c | 45
--
  1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
0c14325b8d..a050472aea 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2418,9 +2418,50 @@ start_packet_forwarding(int with_tx_first)
if (!pkt_fwd_shared_rxq_check())
return;

-   if (stream_init != NULL)
-   for (i = 0; i < cur_fwd_config.nb_fwd_streams; i++)
+   if (stream_init != NULL) {
+   for (i = 0; i < cur_fwd_config.nb_fwd_streams; i++) {
+   if (rte_eal_process_type() == RTE_PROC_SECONDARY)

{

+   struct fwd_stream *fs = fwd_streams[i];
+   struct rte_eth_rxq_info rx_qinfo;
+   struct rte_eth_txq_info tx_qinfo;
+   int32_t rc;
+   rc = rte_eth_rx_queue_info_get(fs->rx_port,
+   fs->rx_queue, &rx_qinfo);
+   if (rc == 0) {
+   ports[fs->rx_port].rxq[fs-
rx_queue].state =
+   rx_qinfo.queue_state;
+   } else if (rc == -ENOTSUP) {
+   /* Set the rxq state to

RTE_ETH_QUEUE_STATE_STARTED

+* to ensure that the PMDs do not

implement

+* rte_eth_rx_queue_info_get can

forward.

+*/
+   ports[fs->rx_port].rxq[fs-
rx_queue].state =
+

RTE_ETH_QUEUE_STATE_STARTED;

+   } else {
+   TESTPMD_LOG(WARNING,
+   "Failed to get rx queue

info\n");

+   }
+
+   rc = rte_eth_tx_queue_info_get(fs->tx_port,
+   fs->tx_queue, &tx_qinfo);
+   if (rc == 0) {
+   ports[fs->tx_port].txq[fs-
tx_queue].state =
+   tx_qinfo.queue_state;
+   } else if (rc == -ENOTSUP) {
+   /* Set the txq state to

RTE_ETH_QUEUE_STATE_STARTED

+* to ensure that the PMDs do not

implement

+* rte_eth_tx_queue_info_get can

forward.

+*/
+   ports[fs->tx_port].txq[fs-
tx_queue].state =
+

RTE_ETH_QUEUE_STATE_STARTED;

+   } else {
+   TESTPMD_LOG(WARNING,
+   "Failed to get tx queue

info\n");

+   }
+   }
stream_init(fwd_streams[i]);
+   }
+   }



Testpmd duplicates some dpdk/ethdev state/config in application
level, and this can bite in multiple cases, as it is happening here.

I am not sure if this was a design decision, but I think instead of
testpmd storing ethdev related state/config in application level, it
should store only application level state/config, and when ethdev
related state/config is required app should get it directly from ethdev.

It may be too late already for testpmd, there is a mixed usage, but I
am for preferring this approach when there is an opportunity.



For above issue, why

RE: [PATCH 0/5] net/mlx5: add indirect QUOTA create/query/modify

2023-03-07 Thread Suanming Mou
Hi Gregory,

The code looks good to me. But I assume doc update is missing here. Can you 
please update the relevant doc and release notes?

BR,
Suanming Mou

> -Original Message-
> From: Gregory Etelson 
> Sent: Wednesday, January 18, 2023 8:56 PM
> To: dev@dpdk.org
> Cc: Gregory Etelson ; Matan Azrad
> ; Raslan Darawsheh 
> Subject: [PATCH 0/5] net/mlx5: add indirect QUOTA create/query/modify
> 
> Add indirect quota flow action.
> Add match on quota flow item.
> 
> Gregory Etelson (5):
>   net/mlx5: update query fields in async job structure
>   net/mlx5: remove code duplication
>   common/mlx5: update MTR ASO definitions
>   net/mlx5: add indirect QUOTA create/query/modify
>   mlx5dr: Definer, translate RTE quota item
> 
>  drivers/common/mlx5/mlx5_prm.h|   4 +
>  drivers/net/mlx5/hws/mlx5dr_definer.c |  61 +++
>  drivers/net/mlx5/meson.build  |   1 +
>  drivers/net/mlx5/mlx5.h   |  88 +++-
>  drivers/net/mlx5/mlx5_flow.c  |  62 +++
>  drivers/net/mlx5/mlx5_flow.h  |  20 +-
>  drivers/net/mlx5/mlx5_flow_aso.c  |  10 +-
>  drivers/net/mlx5/mlx5_flow_hw.c   | 527 +--
>  drivers/net/mlx5/mlx5_flow_quota.c| 726 ++
>  9 files changed, 1318 insertions(+), 181 deletions(-)  create mode 100644
> drivers/net/mlx5/mlx5_flow_quota.c
> 
> --
> 2.34.1



RE: [PATCH] net/mlx5: reject flow template API reconfiguration

2023-03-07 Thread Suanming Mou



> -Original Message-
> From: Dariusz Sosnowski 
> Sent: Sunday, February 26, 2023 3:58 AM
> To: Matan Azrad ; Slava Ovsiienko
> 
> Cc: dev@dpdk.org
> Subject: [PATCH] net/mlx5: reject flow template API reconfiguration
> 
> Flow Template API allows rte_flow_configure() to be called more than once, to
> allow applications to dynamically reconfigure the amount of resources 
> available
> for flow table and flow rule creation.
> PMDs can reject the change in configuration and keep the old configuration.
> 
> Before this patch, during Flow Template API reconfiguration in mlx5 PMD, all 
> the
> allocated resources (HW/FW object pools, flow rules, pattern and actions
> templates) were released and reallocated according to the new configuration.
> This however leads to undefined behavior, since all references to templates,
> flows or indirect actions, which are held by the user, are now invalidated.
> 
> This patch changes the reconfiguration behavior.
> Configuration provided to rte_flow_configure() is stored in port's private 
> data
> structure. On any subsequent call to rte_flow_configure(), the provided
> configuration is compared against the stored configuration.
> If both are equal, rte_flow_configure() reports a success and configuration is
> unaffected. Otherwise, (-ENOTSUP) is returned.
> 
> Signed-off-by: Dariusz Sosnowski 
Acked-by: Suanming Mou 


RE: [PATCH 1/2] net/mlx5: fix egress group translation in HWS

2023-03-07 Thread Suanming Mou



> -Original Message-
> From: Dariusz Sosnowski 
> Sent: Sunday, February 26, 2023 4:18 AM
> To: Matan Azrad ; Slava Ovsiienko
> 
> Cc: dev@dpdk.org; sta...@dpdk.org; Ori Kam 
> Subject: [PATCH 1/2] net/mlx5: fix egress group translation in HWS
> 
> With HW Steering enabled creating egress template tables and egress flow rules
> on E-Switch setups is allowed.
> To enable it, PMD creates a set of default egress flow rules responsible for:
> 
> - Storing representor ID (vport tag is used) in HW register.
>   This is used for traffic source identification.
> - Copying software metadata to proper HW register to allow
>   preserving metadata across domains.
> 
> Structure of these flow rules and whether they are inserted depend on the
> device configuration.
> There are the following cases:
> 
> 1. repr_matching=1 and dv_xmeta_en=4
>- An egress flow rule in group 0 is created for each Tx queue;
>- Flow rule matching SQ number - fills unused REG_C_0 bits
>  with vport tag, copies REG_A to REG_C_1 and jumps to group 1.
> 2. repr_matching=1 and dv_xmeta_en=0
>- An egress flow rule in group 0 is created for each Tx queue;
>- Flow rule matching SQ number - fills unused REG_C_0 bits
>  with vport tag and jumps to group 1.
> 3. repr_matching=0 and dv_xmeta_en=4
>- A single egress flow rule in group 0 is created;
>- Flow rule matches all E-Switch manager TX traffic,
>  copies REG_A to REG_C and jumps to group 1.
> 4. repr_matching=0 and dv_xmeta_en=0 - no default flow rules are added.
> 
> When default egress flow rules are required, they are inserted in group 0 and
> this group is reserved for PMD purposes.
> User created template tables must be created in higher groups.
> As a result, on template table creation PMD is translating the provided group
> (incrementing it in that case).
> 
> Before this patch, a condition used to check if translation of egress flow 
> group is
> needed was incorrect. It did not allow translation if both representor 
> matching
> AND extended metadata mode were enabled.
> 
> This patch fixes this condition - translation is allowed if and only if 
> representor
> matching OR extended metadata mode is enabled.
> 
> Fixes: 483181f7b6dd ("net/mlx5: support device control of representor
> matching")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Dariusz Sosnowski 
> Acked-by: Ori Kam 
Acked-by: Suanming Mou 



RE: [PATCH 2/2] net/mlx5: fix isolated mode when repr matching is disabled

2023-03-07 Thread Suanming Mou



> -Original Message-
> From: Dariusz Sosnowski 
> Sent: Sunday, February 26, 2023 4:18 AM
> To: Matan Azrad ; Slava Ovsiienko
> 
> Cc: dev@dpdk.org; sta...@dpdk.org; Ori Kam 
> Subject: [PATCH 2/2] net/mlx5: fix isolated mode when repr matching is 
> disabled
> 
> In HW steering mode, when running on an E-Switch setup,
> mlx5 PMD provides an ability to enable or disable representor matching 
> (through
> `repr_matching_en` device argument).
> If representor matching is enabled, any ingress or egress flow rule, created 
> on
> any port representor will match traffic related to that specific port.
> If it is disabled, flow rule created on one of the ports, will match traffic 
> related
> to all ports.
> 
> As a result, when representor matching is disabled, PMD cannot correctly 
> create
> control flow rules for receiving default traffic according to port 
> configuration.
> Since each port representor in the same switch domain, can have different port
> configuration and flow rules do not differentiate between ports, these flow
> rules cannot be correctly applied.
> In that case, each port works in de facto isolated mode.
> 
> This patch makes sure that if representor matching is disabled, port is forced
> into isolated mode. Disabling flow isolated is forbidden.
> 
> Fixes: 483181f7b6dd ("net/mlx5: support device control of representor
> matching")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Dariusz Sosnowski 
> Acked-by: Ori Kam 
Acked-by: Suanming Mou 

Thanks.


RE: [PATCH v3 2/2] net/mlx5: add MPLS tunnel support for HWS

2023-03-07 Thread Suanming Mou



> -Original Message-
> From: Michael Baum 
> Sent: Thursday, February 23, 2023 3:48 PM
> To: dev@dpdk.org
> Cc: Matan Azrad ; Raslan Darawsheh
> ; Slava Ovsiienko ; Ori Kam
> 
> Subject: [PATCH v3 2/2] net/mlx5: add MPLS tunnel support for HWS
> 
> Add support for MPLS tunnel item in HWS.
> 
> Signed-off-by: Michael Baum 
> Acked-by: Ori Kam 
Acked-by: Suanming Mou 


[PATCH v2] net/ice: fix incorrect Rx timestamp

2023-03-07 Thread Simei Su
For E822, the time value in Rx Flex Descriptors is 0 due to the
missing PHY clock timer setup. Also, the source clock index in use
is based on device capabilities instead of always being zero.

Fixes: 953e74e6b73a ("net/ice: enable Rx timestamp on flex descriptor")
Fixes: 646dcbe6c701 ("net/ice: support IEEE 1588 PTP")
Fixes: fb800fde66f4 ("net/ice/base: work around missing PTP capabilities")
Cc: sta...@dpdk.org

Signed-off-by: Simei Su 
---
v2:
* Refine commit title and commit log.
* Remove duplicate code.
* Rework share code for "SIMICS_SUPPORT".

 drivers/net/ice/base/ice_common.c |  4 +---
 drivers/net/ice/ice_ethdev.c  | 36 ++--
 drivers/net/ice/ice_rxtx.h| 11 ++-
 3 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ice/base/ice_common.c 
b/drivers/net/ice/base/ice_common.c
index 5391bd6..1a02aad 100644
--- a/drivers/net/ice/base/ice_common.c
+++ b/drivers/net/ice/base/ice_common.c
@@ -2554,9 +2554,7 @@ ice_parse_1588_func_caps(struct ice_hw *hw, struct 
ice_hw_func_caps *func_p,
 struct ice_aqc_list_caps_elem *cap)
 {
struct ice_ts_func_info *info = &func_p->ts_func_info;
-   u32 number = ICE_TS_FUNC_ENA_M | ICE_TS_SRC_TMR_OWND_M |
-ICE_TS_TMR_ENA_M | ICE_TS_TMR_IDX_OWND_M |
-ICE_TS_TMR_IDX_ASSOC_M;
+   u32 number = LE32_TO_CPU(cap->number);
u8 clk_freq;
 
ice_debug(hw, ICE_DBG_INIT, "1588 func caps: raw value %x\n", number);
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 0d011bb..9a88cf9 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -2413,6 +2413,17 @@ ice_dev_init(struct rte_eth_dev *dev)
/* Initialize TM configuration */
ice_tm_conf_init(dev);
 
+   if (ice_is_e810(hw))
+   hw->phy_cfg = ICE_PHY_E810;
+   else
+   hw->phy_cfg = ICE_PHY_E822;
+
+   if (hw->phy_cfg == ICE_PHY_E822) {
+   ret = ice_start_phy_timer_e822(hw, hw->pf_id, true);
+   if (ret)
+   PMD_INIT_LOG(ERR, "Failed to start phy timer\n");
+   }
+
if (!ad->is_safe_mode) {
ret = ice_flow_init(ad);
if (ret) {
@@ -5814,11 +5825,6 @@ ice_timesync_enable(struct rte_eth_dev *dev)
return -1;
}
 
-   if (ice_is_e810(hw))
-   hw->phy_cfg = ICE_PHY_E810;
-   else
-   hw->phy_cfg = ICE_PHY_E822;
-
if (hw->func_caps.ts_func_info.src_tmr_owned) {
ret = ice_ptp_init_phc(hw);
if (ret) {
@@ -5939,16 +5945,17 @@ ice_timesync_read_time(struct rte_eth_dev *dev, struct 
timespec *ts)
struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ice_adapter *ad =
ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+   uint8_t tmr_idx = hw->func_caps.ts_func_info.tmr_index_assoc;
uint32_t hi, lo, lo2;
uint64_t time, ns;
 
-   lo = ICE_READ_REG(hw, GLTSYN_TIME_L(0));
-   hi = ICE_READ_REG(hw, GLTSYN_TIME_H(0));
-   lo2 = ICE_READ_REG(hw, GLTSYN_TIME_L(0));
+   lo = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx));
+   hi = ICE_READ_REG(hw, GLTSYN_TIME_H(tmr_idx));
+   lo2 = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx));
 
if (lo2 < lo) {
-   lo = ICE_READ_REG(hw, GLTSYN_TIME_L(0));
-   hi = ICE_READ_REG(hw, GLTSYN_TIME_H(0));
+   lo = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx));
+   hi = ICE_READ_REG(hw, GLTSYN_TIME_H(tmr_idx));
}
 
time = ((uint64_t)hi << 32) | lo;
@@ -5964,6 +5971,7 @@ ice_timesync_disable(struct rte_eth_dev *dev)
struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ice_adapter *ad =
ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+   uint8_t tmr_idx = hw->func_caps.ts_func_info.tmr_index_assoc;
uint64_t val;
uint8_t lport;
 
@@ -5971,12 +5979,12 @@ ice_timesync_disable(struct rte_eth_dev *dev)
 
ice_clear_phy_tstamp(hw, lport, 0);
 
-   val = ICE_READ_REG(hw, GLTSYN_ENA(0));
+   val = ICE_READ_REG(hw, GLTSYN_ENA(tmr_idx));
val &= ~GLTSYN_ENA_TSYN_ENA_M;
-   ICE_WRITE_REG(hw, GLTSYN_ENA(0), val);
+   ICE_WRITE_REG(hw, GLTSYN_ENA(tmr_idx), val);
 
-   ICE_WRITE_REG(hw, GLTSYN_INCVAL_L(0), 0);
-   ICE_WRITE_REG(hw, GLTSYN_INCVAL_H(0), 0);
+   ICE_WRITE_REG(hw, GLTSYN_INCVAL_L(tmr_idx), 0);
+   ICE_WRITE_REG(hw, GLTSYN_INCVAL_H(tmr_idx), 0);
 
ad->ptp_ena = 0;
 
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index 4947d5c..94f6bcf 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -349,26 +349,27 @@ static inline
 uint64_t ice_tstamp_convert_32b_64b(struct ice_hw *hw, struct ice_adapter *ad,
uint32_t flag,

RE: [PATCH v2] net/ice: fix incorrect Rx timestamp

2023-03-07 Thread Wu, Wenjun1



> -Original Message-
> From: Su, Simei 
> Sent: Wednesday, March 8, 2023 12:37 PM
> To: Zhang, Qi Z ; Yang, Qiming
> 
> Cc: dev@dpdk.org; Wu, Wenjun1 ; Su, Simei
> ; sta...@dpdk.org
> Subject: [PATCH v2] net/ice: fix incorrect Rx timestamp
> 
> For E822, the time value in Rx Flex Descriptors is 0 due to the missing PHY
> clock timer setup. Also, the source clock index in use is based on device
> capabilities instead of always being zero.
> 
> Fixes: 953e74e6b73a ("net/ice: enable Rx timestamp on flex descriptor")
> Fixes: 646dcbe6c701 ("net/ice: support IEEE 1588 PTP")
> Fixes: fb800fde66f4 ("net/ice/base: work around missing PTP capabilities")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Simei Su 
> ---
> v2:
> * Refine commit title and commit log.
> * Remove duplicate code.
> * Rework share code for "SIMICS_SUPPORT".
> 
>  drivers/net/ice/base/ice_common.c |  4 +---
>  drivers/net/ice/ice_ethdev.c  | 36 ++--
>  drivers/net/ice/ice_rxtx.h| 11 ++-
>  3 files changed, 29 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/net/ice/base/ice_common.c
> b/drivers/net/ice/base/ice_common.c
> index 5391bd6..1a02aad 100644
> --- a/drivers/net/ice/base/ice_common.c
> +++ b/drivers/net/ice/base/ice_common.c
> @@ -2554,9 +2554,7 @@ ice_parse_1588_func_caps(struct ice_hw *hw,
> struct ice_hw_func_caps *func_p,
>struct ice_aqc_list_caps_elem *cap)  {
>   struct ice_ts_func_info *info = &func_p->ts_func_info;
> - u32 number = ICE_TS_FUNC_ENA_M | ICE_TS_SRC_TMR_OWND_M |
> -  ICE_TS_TMR_ENA_M | ICE_TS_TMR_IDX_OWND_M |
> -  ICE_TS_TMR_IDX_ASSOC_M;
> + u32 number = LE32_TO_CPU(cap->number);
>   u8 clk_freq;
> 
>   ice_debug(hw, ICE_DBG_INIT, "1588 func caps: raw value %x\n",
> number); diff --git a/drivers/net/ice/ice_ethdev.c
> b/drivers/net/ice/ice_ethdev.c index 0d011bb..9a88cf9 100644
> --- a/drivers/net/ice/ice_ethdev.c
> +++ b/drivers/net/ice/ice_ethdev.c
> @@ -2413,6 +2413,17 @@ ice_dev_init(struct rte_eth_dev *dev)
>   /* Initialize TM configuration */
>   ice_tm_conf_init(dev);
> 
> + if (ice_is_e810(hw))
> + hw->phy_cfg = ICE_PHY_E810;
> + else
> + hw->phy_cfg = ICE_PHY_E822;
> +
> + if (hw->phy_cfg == ICE_PHY_E822) {
> + ret = ice_start_phy_timer_e822(hw, hw->pf_id, true);
> + if (ret)
> + PMD_INIT_LOG(ERR, "Failed to start phy timer\n");
> + }
> +
>   if (!ad->is_safe_mode) {
>   ret = ice_flow_init(ad);
>   if (ret) {
> @@ -5814,11 +5825,6 @@ ice_timesync_enable(struct rte_eth_dev *dev)
>   return -1;
>   }
> 
> - if (ice_is_e810(hw))
> - hw->phy_cfg = ICE_PHY_E810;
> - else
> - hw->phy_cfg = ICE_PHY_E822;
> -
>   if (hw->func_caps.ts_func_info.src_tmr_owned) {
>   ret = ice_ptp_init_phc(hw);
>   if (ret) {
> @@ -5939,16 +5945,17 @@ ice_timesync_read_time(struct rte_eth_dev
> *dev, struct timespec *ts)
>   struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
>   struct ice_adapter *ad =
>   ICE_DEV_PRIVATE_TO_ADAPTER(dev->data-
> >dev_private);
> + uint8_t tmr_idx = hw->func_caps.ts_func_info.tmr_index_assoc;
>   uint32_t hi, lo, lo2;
>   uint64_t time, ns;
> 
> - lo = ICE_READ_REG(hw, GLTSYN_TIME_L(0));
> - hi = ICE_READ_REG(hw, GLTSYN_TIME_H(0));
> - lo2 = ICE_READ_REG(hw, GLTSYN_TIME_L(0));
> + lo = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx));
> + hi = ICE_READ_REG(hw, GLTSYN_TIME_H(tmr_idx));
> + lo2 = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx));
> 
>   if (lo2 < lo) {
> - lo = ICE_READ_REG(hw, GLTSYN_TIME_L(0));
> - hi = ICE_READ_REG(hw, GLTSYN_TIME_H(0));
> + lo = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx));
> + hi = ICE_READ_REG(hw, GLTSYN_TIME_H(tmr_idx));
>   }
> 
>   time = ((uint64_t)hi << 32) | lo;
> @@ -5964,6 +5971,7 @@ ice_timesync_disable(struct rte_eth_dev *dev)
>   struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
>   struct ice_adapter *ad =
>   ICE_DEV_PRIVATE_TO_ADAPTER(dev->data-
> >dev_private);
> + uint8_t tmr_idx = hw->func_caps.ts_func_info.tmr_index_assoc;
>   uint64_t val;
>   uint8_t lport;
> 
> @@ -5971,12 +5979,12 @@ ice_timesync_disable(struct rte_eth_dev *dev)
> 
>   ice_clear_phy_tstamp(hw, lport, 0);
> 
> - val = ICE_READ_REG(hw, GLTSYN_ENA(0));
> + val = ICE_READ_REG(hw, GLTSYN_ENA(tmr_idx));
>   val &= ~GLTSYN_ENA_TSYN_ENA_M;
> - ICE_WRITE_REG(hw, GLTSYN_ENA(0), val);
> + ICE_WRITE_REG(hw, GLTSYN_ENA(tmr_idx), val);
> 
> - ICE_WRITE_REG(hw, GLTSYN_INCVAL_L(0), 0);
> - ICE_WRITE_REG(hw, GLTSYN_INCVAL_H(0), 0);
> + ICE_WRITE_REG(hw, GLTSYN_INCVAL_L(tmr_idx), 0);
> + ICE_WRITE_REG(hw, GLTSYN_INCVAL_H(tmr_idx), 0);
> 
> 

Re: Testpmd/l3fwd port shutdown failure on Arm Altra systems

2023-03-07 Thread Juraj Linkeš
Hello Qiming, Beilei,

Another reminder - are you looking at this by any chance?

The high level short description is that testpmd/l3fwd breaks a link
between two servers while VPP (using DPDK) doesn't. This leads us to
believe there's a problem with testpmd/l3fwd/i40e driver in DPDK.

Thanks,
Juraj

On Tue, Feb 21, 2023 at 12:18 PM Juraj Linkeš
 wrote:
>
> Hi Qiming,
>
> Just a friendly reminder, would you please take a look?
>
> Thanks,
> Juraj
>
>
> On Tue, Feb 7, 2023 at 3:10 AM Xing, Beilei  wrote:
> >
> > Hi Qiming,
> >
> > Could you please help on this? Thanks.
> >
> > BR,
> > Beilei
> >
> > > -Original Message-
> > > From: Juraj Linkeš 
> > > Sent: Monday, February 6, 2023 4:53 PM
> > > To: Singh, Aman Deep ; Zhang, Yuying
> > > ; Xing, Beilei 
> > > Cc: dev@dpdk.org; Ruifeng Wang ; Zhang, Lijian
> > > ; Honnappa Nagarahalli
> > > 
> > > Subject: Re: Testpmd/l3fwd port shutdown failure on Arm Altra systems
> > >
> > > Hello i40e and testpmd maintainers,
> > >
> > > A gentle reminder - would you please advise how to debug the issue 
> > > described
> > > below?
> > >
> > > Thanks,
> > > Juraj
> > >
> > > On Fri, Jan 20, 2023 at 1:07 PM Juraj Linkeš 
> > > wrote:
> > > >
> > > > Adding the logfile.
> > > >
> > > >
> > > >
> > > > One thing that's in the logs but didn't explicitly mention is the DPDK 
> > > > version
> > > we've tried this with:
> > > >
> > > > EAL: RTE Version: 'DPDK 22.07.0'
> > > >
> > > >
> > > >
> > > > We also tried earlier versions going back to 21.08, with no luck. I 
> > > > also did a
> > > quick check on 22.11, also with no luck.
> > > >
> > > >
> > > >
> > > > Juraj
> > > >
> > > >
> > > >
> > > > From: Juraj Linkeš
> > > > Sent: Friday, January 20, 2023 12:56 PM
> > > > To: 'aman.deep.si...@intel.com' ;
> > > > 'yuying.zh...@intel.com' ; Xing, Beilei
> > > > 
> > > > Cc: dev@dpdk.org; Ruifeng Wang ; 'Lijian Zhang'
> > > > ; 'Honnappa Nagarahalli'
> > > > 
> > > > Subject: Testpmd/l3fwd port shutdown failure on Arm Altra systems
> > > >
> > > >
> > > >
> > > > Hello i40e and testpmd maintainers,
> > > >
> > > >
> > > >
> > > > We're hitting an issue with DPDK testpmd on Ampere Altra servers in 
> > > > FD.io
> > > lab.
> > > >
> > > >
> > > >
> > > > A bit of background: along with VPP performance tests (which uses DPDK),
> > > we're running a small number of basic DPDK testpmd and l3fwd tests in 
> > > FD.io
> > > as well. This is to catch any performance differences due to VPP updating 
> > > its
> > > DPDK version.
> > > >
> > > >
> > > >
> > > > We're running both l3fwd tests and testpmd tests. The Altra servers are 
> > > > two
> > > socket and the topology is TG -> DUT1 -> DUT2 -> TG, traffic flows in both
> > > directions, but nothing gets forwarded (with a slight caveat - put a pin 
> > > in this).
> > > There's nothing special in the tests, just forwarding traffic. The NIC 
> > > we're
> > > testing is xl710-QDA2.
> > > >
> > > >
> > > >
> > > > The same tests are passing on all other testbeds - we have various two 
> > > > node
> > > (1 DUT, 1 TG) and three node (2 DUT, 1 TG) Intel and Arm testbeds and with
> > > various NICs (Intel 700 and 800 series and the Intel testbeds use some
> > > Mellanox NICs as well). We don't have quite the same combination of 
> > > another
> > > three node topology with the same NIC though, so it looks like something 
> > > with
> > > testpmd/l3fwd and xl710-QDA2 on Altra servers.
> > > >
> > > >
> > > >
> > > > VPP performance tests are passing, but l3fwd and testpmd fail. This 
> > > > leads us
> > > to believe to it's a software issue, but there could something wrong with 
> > > the
> > > hardware. I'll talk about testpmd from now on, but as far we can tell, the
> > > behavior is the same for testpmd and l3fwd.
> > > >
> > > >
> > > >
> > > > Getting back to the caveat mentioned earlier, there seems to be 
> > > > something
> > > wrong with port shutdown. When running testpmd on a testbed that hasn't
> > > been used for a while it seems that all ports are up right away (we don't 
> > > see
> > > any "Port 0|1: link state change event") and the setup works fine 
> > > (forwarding
> > > works). After restarting testpmd (restarting on one server is 
> > > sufficient), the
> > > ports between DUT1 and DUT2 (but not between DUTs and TG) go down and
> > > are not usable in DPDK, VPP or in Linux (with i40e kernel driver) for a 
> > > while
> > > (measured in minutes, sometimes dozens of minutes; the duration is 
> > > seemingly
> > > random). The ports eventually recover and can be used again, but there's
> > > nothing in syslog suggesting what happened.
> > > >
> > > >
> > > >
> > > > What seems to be happening is testpmd put the ports into some faulty 
> > > > state.
> > > This only happens on the DUT1 -> DUT2 link though (the ports between the
> > > two testpmds), not on TG -> DUT1 link (the TG port is left alone).
> > > >
> > > >
> > > >
> > > > Some more info:
> > > >
> > > > We've come across the issue with this configurat

RE: [PATCH v2] net/ice: fix incorrect Rx timestamp

2023-03-07 Thread Zhang, Qi Z



> -Original Message-
> From: Wu, Wenjun1 
> Sent: Wednesday, March 8, 2023 2:00 PM
> To: Su, Simei ; Zhang, Qi Z ;
> Yang, Qiming 
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: RE: [PATCH v2] net/ice: fix incorrect Rx timestamp
> 
> 
> 
> > -Original Message-
> > From: Su, Simei 
> > Sent: Wednesday, March 8, 2023 12:37 PM
> > To: Zhang, Qi Z ; Yang, Qiming
> > 
> > Cc: dev@dpdk.org; Wu, Wenjun1 ; Su, Simei
> > ; sta...@dpdk.org
> > Subject: [PATCH v2] net/ice: fix incorrect Rx timestamp
> >
> > For E822, the time value in Rx Flex Descriptors is 0 due to the
> > missing PHY clock timer setup. Also, the source clock index in use is
> > based on device capabilities instead of always being zero.
> >
> > Fixes: 953e74e6b73a ("net/ice: enable Rx timestamp on flex
> > descriptor")
> > Fixes: 646dcbe6c701 ("net/ice: support IEEE 1588 PTP")
> > Fixes: fb800fde66f4 ("net/ice/base: work around missing PTP
> > capabilities")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Simei Su 
> > ---
> > v2:
> > * Refine commit title and commit log.
> > * Remove duplicate code.
> > * Rework share code for "SIMICS_SUPPORT".
> >
> >  drivers/net/ice/base/ice_common.c |  4 +---
> >  drivers/net/ice/ice_ethdev.c  | 36 ++--
> >  drivers/net/ice/ice_rxtx.h| 11 ++-
> >  3 files changed, 29 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/net/ice/base/ice_common.c
> > b/drivers/net/ice/base/ice_common.c
> > index 5391bd6..1a02aad 100644
> > --- a/drivers/net/ice/base/ice_common.c
> > +++ b/drivers/net/ice/base/ice_common.c
> > @@ -2554,9 +2554,7 @@ ice_parse_1588_func_caps(struct ice_hw *hw,
> > struct ice_hw_func_caps *func_p,
> >  struct ice_aqc_list_caps_elem *cap)  {
> > struct ice_ts_func_info *info = &func_p->ts_func_info;
> > -   u32 number = ICE_TS_FUNC_ENA_M | ICE_TS_SRC_TMR_OWND_M |
> > -ICE_TS_TMR_ENA_M | ICE_TS_TMR_IDX_OWND_M |
> > -ICE_TS_TMR_IDX_ASSOC_M;
> > +   u32 number = LE32_TO_CPU(cap->number);
> > u8 clk_freq;
> >
> > ice_debug(hw, ICE_DBG_INIT, "1588 func caps: raw value %x\n",
> > number); diff --git a/drivers/net/ice/ice_ethdev.c
> > b/drivers/net/ice/ice_ethdev.c index 0d011bb..9a88cf9 100644
> > --- a/drivers/net/ice/ice_ethdev.c
> > +++ b/drivers/net/ice/ice_ethdev.c
> > @@ -2413,6 +2413,17 @@ ice_dev_init(struct rte_eth_dev *dev)
> > /* Initialize TM configuration */
> > ice_tm_conf_init(dev);
> >
> > +   if (ice_is_e810(hw))
> > +   hw->phy_cfg = ICE_PHY_E810;
> > +   else
> > +   hw->phy_cfg = ICE_PHY_E822;
> > +
> > +   if (hw->phy_cfg == ICE_PHY_E822) {
> > +   ret = ice_start_phy_timer_e822(hw, hw->pf_id, true);
> > +   if (ret)
> > +   PMD_INIT_LOG(ERR, "Failed to start phy timer\n");
> > +   }
> > +
> > if (!ad->is_safe_mode) {
> > ret = ice_flow_init(ad);
> > if (ret) {
> > @@ -5814,11 +5825,6 @@ ice_timesync_enable(struct rte_eth_dev *dev)
> > return -1;
> > }
> >
> > -   if (ice_is_e810(hw))
> > -   hw->phy_cfg = ICE_PHY_E810;
> > -   else
> > -   hw->phy_cfg = ICE_PHY_E822;
> > -
> > if (hw->func_caps.ts_func_info.src_tmr_owned) {
> > ret = ice_ptp_init_phc(hw);
> > if (ret) {
> > @@ -5939,16 +5945,17 @@ ice_timesync_read_time(struct rte_eth_dev
> > *dev, struct timespec *ts)
> > struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data-
> > >dev_private);
> > struct ice_adapter *ad =
> > ICE_DEV_PRIVATE_TO_ADAPTER(dev->data-
> > >dev_private);
> > +   uint8_t tmr_idx = hw->func_caps.ts_func_info.tmr_index_assoc;
> > uint32_t hi, lo, lo2;
> > uint64_t time, ns;
> >
> > -   lo = ICE_READ_REG(hw, GLTSYN_TIME_L(0));
> > -   hi = ICE_READ_REG(hw, GLTSYN_TIME_H(0));
> > -   lo2 = ICE_READ_REG(hw, GLTSYN_TIME_L(0));
> > +   lo = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx));
> > +   hi = ICE_READ_REG(hw, GLTSYN_TIME_H(tmr_idx));
> > +   lo2 = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx));
> >
> > if (lo2 < lo) {
> > -   lo = ICE_READ_REG(hw, GLTSYN_TIME_L(0));
> > -   hi = ICE_READ_REG(hw, GLTSYN_TIME_H(0));
> > +   lo = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx));
> > +   hi = ICE_READ_REG(hw, GLTSYN_TIME_H(tmr_idx));
> > }
> >
> > time = ((uint64_t)hi << 32) | lo;
> > @@ -5964,6 +5971,7 @@ ice_timesync_disable(struct rte_eth_dev *dev)
> > struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data-
> > >dev_private);
> > struct ice_adapter *ad =
> > ICE_DEV_PRIVATE_TO_ADAPTER(dev->data-
> > >dev_private);
> > +   uint8_t tmr_idx = hw->func_caps.ts_func_info.tmr_index_assoc;
> > uint64_t val;
> > uint8_t lport;
> >
> > @@ -5971,12 +5979,12 @@ ice_timesync_disable(struct rte_eth_dev *dev)
> >
> > ice_clear_phy_tstamp(hw, lport, 0);
> >
> > -   val = ICE_READ_REG(hw, GLTSYN_ENA(0));
> > +   val = ICE_READ_REG(hw, GLTSYN_ENA(tmr_

Re: [PATCH 6/9] net/hns3: fix segment fault when parse runtime config

2023-03-07 Thread Dongdong Liu



On 2023/3/2 15:50, Chengwen Feng wrote:

The rte_kvargs_process() was used to parse KV pairs, it also supports
to parse 'only keys' (e.g. socket_id) type. And the callback function
parameter 'value' is NULL when parsed 'only keys'.

This patch fixes segment fault when parse input args with 'only keys'
(e.g. rx_func_hint).

Fixes: a124f9e9591b ("net/hns3: add runtime config to select IO burst function")
Fixes: 70791213242e ("net/hns3: support masking device capability")
Fixes: 2fc3e696a7f1 ("net/hns3: add runtime config for mailbox limit time")
Cc: sta...@dpdk.org

Signed-off-by: Chengwen Feng 


Acked-by: Dongdong Liu 

Thanks,
Dongdong