Re: [dpdk-dev] code.dpdk.org bugfix releases

2021-06-04 Thread David Marchand
On Thu, Jun 3, 2021 at 9:18 PM Thomas Monjalon  wrote:
>
> 03/06/2021 20:15, Morten Brørup:
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon
> > > Sent: Thursday, 3 June 2021 18.11
> > >
> > > 03/06/2021 17:18, Morten Brørup:
> > > > Hi all,
> > > >
> > > > Bugfix releases are missing on code.dpdk.org.
> > > >
> > > > E.g. version 17.11 is present, but version 17.11.10 is missing.
> > > >
> > > > Can whoever is the maintainer of code.dpdk.org please fix this.
> > >
> > > Ali is the admin of dpdk.org.
> > > I'm not sure your request is easy because stable releases
> > > are in a separate repository dpdk-stable.git.
> >
> > Ali, it would be great if you can include them.
> > Not having easy access to browse the source code of the LTS
> > bugfix releases could be a barrier for DPDK users to upgrade
> > to the latest LTS bugfix release.
>
> Actually it's already included (thanks Ali for reminding it to me):
> https://code.dpdk.org/dpdk-stable/v20.11/source
>
> There are 2 different repos and that's how it is exposed in this tool.
> It may be a bit confusing.
> Do you think it would be OK to rename dpdk-stable to dpdk in this tool?

If we do this, in the current state, we lose the non-stable branches.

19.02 and 19.05 (at least) are not present in the dpdk-stable part.
http://code.dpdk.org/dpdk-stable/v19.11.8/source

Not sure how to fix this, do we simply need to push release tags to dpdk-stable?


-- 
David Marchand



Re: [dpdk-dev] code.dpdk.org bugfix releases

2021-06-04 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of David Marchand
> Sent: Friday, 4 June 2021 09.01
> 
> On Thu, Jun 3, 2021 at 9:18 PM Thomas Monjalon 
> wrote:
> >
> > 03/06/2021 20:15, Morten Brørup:
> > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> Monjalon
> > > > Sent: Thursday, 3 June 2021 18.11
> > > >
> > > > 03/06/2021 17:18, Morten Brørup:
> > > > > Hi all,
> > > > >
> > > > > Bugfix releases are missing on code.dpdk.org.
> > > > >
> > > > > E.g. version 17.11 is present, but version 17.11.10 is missing.
> > > > >
> > > > > Can whoever is the maintainer of code.dpdk.org please fix this.
> > > >
> > > > Ali is the admin of dpdk.org.
> > > > I'm not sure your request is easy because stable releases
> > > > are in a separate repository dpdk-stable.git.
> > >
> > > Ali, it would be great if you can include them.
> > > Not having easy access to browse the source code of the LTS
> > > bugfix releases could be a barrier for DPDK users to upgrade
> > > to the latest LTS bugfix release.
> >
> > Actually it's already included (thanks Ali for reminding it to me):
> > https://code.dpdk.org/dpdk-stable/v20.11/source
> >
> > There are 2 different repos and that's how it is exposed in this
> tool.
> > It may be a bit confusing.
> > Do you think it would be OK to rename dpdk-stable to dpdk in this
> tool?
> 
> If we do this, in the current state, we lose the non-stable branches.
> 
> 19.02 and 19.05 (at least) are not present in the dpdk-stable part.
> http://code.dpdk.org/dpdk-stable/v19.11.8/source

And even worse, 21.xx are also not present in the dpdk-stable part.

> 
> Not sure how to fix this, do we simply need to push release tags to
> dpdk-stable?



Re: [dpdk-dev] [PATCH 0/2] provide thread unsafe async registration functions

2021-06-04 Thread Maxime Coquelin



On 5/28/21 10:11 AM, Jiayu Hu wrote:
> Lock protection is needed during the vhost notifies the application of
> device readiness, so the first patch is to add lock protection. After
> performing locking, existed async vhost registration functions will cause
> deadlock, as they acquire lock too. So the second patch is to provide
> unsafe registration functions to support calling within vhost callback
> functions.
> 
> Jiayu Hu (2):
>   vhost: fix lock on device readiness notification
>   vhost: add thread unsafe async registration functions
> 
>  doc/guides/prog_guide/vhost_lib.rst |  12 +++
>  lib/vhost/rte_vhost_async.h |  42 ++
>  lib/vhost/version.map   |   4 +
>  lib/vhost/vhost.c   | 161 
> +++-
>  lib/vhost/vhost_user.c  |   5 +-
>  5 files changed, 180 insertions(+), 44 deletions(-)
> 



Re: [dpdk-dev] [PATCH v2] net/ice: fix data path corrupt on secondary process

2021-06-04 Thread Zhang, Qi Z



> -Original Message-
> From: Wang, Yixue 
> Sent: Friday, June 4, 2021 2:52 PM
> To: Zhang, Qi Z ; Yang, Qiming
> 
> Cc: Zhang, Liheng ; Dong, Yao
> ; dev@dpdk.org; sta...@dpdk.org
> Subject: RE: [PATCH v2] net/ice: fix data path corrupt on secondary process
> 
> Hi Qi,
> 
> Patch v2 has been tested.
> 
> Best Regards,
> Yixue.
> 
> > -Original Message-
> > From: Zhang, Qi Z 
> > Sent: Wednesday, May 26, 2021 14:13
> > To: Yang, Qiming 
> > Cc: Zhang, Liheng ; Wang, Yixue
> > ; Dong, Yao ; dev@dpdk.org;
> > Zhang, Qi Z ; sta...@dpdk.org
> > Subject: [PATCH v2] net/ice: fix data path corrupt on secondary
> > process
> >
> > The rte_eth_devices array is not in share memory, it should not be
> > referenced by ice_adapter which is shared by primary and secondary.
> > Any process set ice_adapter->eth_dev will corrupt another process'
> > context.
> >
> > The patch removed the field "eth_dev" from ice_adapter.
> > Now, when the data paths try to access the rte_eth_dev_data instance,
> > they should replace adapter->eth_dev->data with adapter->pf.dev_data.
> >
> > Fixes: f9cf4f864150 ("net/ice: support device initialization")
> > Cc: sta...@dpdk.org
> >
> > Reported-by: Yixue Wang 
> > Signed-off-by: Qi Zhang 
Tested-by: Yixue Wang 

Applied to dpdk-next-net-intel.

Thanks
Qi


Re: [dpdk-dev] [PATCH 0/2] provide thread unsafe async registration functions

2021-06-04 Thread Maxime Coquelin
Sorry, for previous blank reply.

On 5/28/21 10:11 AM, Jiayu Hu wrote:
> Lock protection is needed during the vhost notifies the application of
> device readiness, so the first patch is to add lock protection. After
> performing locking, existed async vhost registration functions will cause
> deadlock, as they acquire lock too. So the second patch is to provide
> unsafe registration functions to support calling within vhost callback
> functions.

I agree the callback should be always protected, and in that case having
a new thread-unsafe API makes sense for async registration.

Regarding backport, I'm not sure what we should do.

Backporting new API is a no-go, but with only backporting patch 1 async
feature will be always broken on 20.11 LTS, right?

What do you think?

Thanks,
Maxime

> Jiayu Hu (2):
>   vhost: fix lock on device readiness notification
>   vhost: add thread unsafe async registration functions
> 
>  doc/guides/prog_guide/vhost_lib.rst |  12 +++
>  lib/vhost/rte_vhost_async.h |  42 ++
>  lib/vhost/version.map   |   4 +
>  lib/vhost/vhost.c   | 161 
> +++-
>  lib/vhost/vhost_user.c  |   5 +-
>  5 files changed, 180 insertions(+), 44 deletions(-)
> 



Re: [dpdk-dev] [PATCH v2 2/2] eal: handle compressed firmwares

2021-06-04 Thread David Marchand
On Fri, Jun 4, 2021 at 12:29 AM Dmitry Kozlyuk  wrote:
>
> 2021-06-03 18:55 (UTC+0200), David Marchand:
> [...]
> > diff --git a/config/meson.build b/config/meson.build
> > index 017bb2efbb..c6985139b4 100644
> > --- a/config/meson.build
> > +++ b/config/meson.build
> > @@ -172,6 +172,13 @@ if libexecinfo.found() and cc.has_header('execinfo.h')
> >  dpdk_extra_ldflags += '-lexecinfo'
> >  endif
> >
> > +libarchive = dependency('libarchive', required: false, method: 
> > 'pkg-config')
> > +if libarchive.found()
> > +dpdk_conf.set('RTE_HAS_LIBARCHIVE', 1)
> > +add_project_link_arguments('-larchive', language: 'c')
> > +dpdk_extra_ldflags += '-larchive'
> > +endif
> > +
>
> Suggestion:
>
> diff --git a/config/meson.build b/config/meson.build
> index c6985139b4..c3668798c1 100644
> --- a/config/meson.build
> +++ b/config/meson.build
> @@ -175,7 +175,6 @@ endif
>  libarchive = dependency('libarchive', required: false, method: 'pkg-config')
>  if libarchive.found()
>  dpdk_conf.set('RTE_HAS_LIBARCHIVE', 1)
> -add_project_link_arguments('-larchive', language: 'c')
>  dpdk_extra_ldflags += '-larchive'
>  endif
>
> diff --git a/lib/eal/meson.build b/lib/eal/meson.build
> index 1722924f67..5a018d97d6 100644
> --- a/lib/eal/meson.build
> +++ b/lib/eal/meson.build
> @@ -16,6 +16,7 @@ subdir(exec_env)
>  subdir(arch_subdir)
>
>  deps += ['kvargs']
> +ext_deps += libarchive
>  if not is_windows
>  deps += ['telemetry']
>  endif
>

I had tried something close when preparing v2 (only keeping
RTE_HAS_LIBARCHIVE in config/meson.build and putting extra_ldflags and
ext_deps in lib/eal/unix/meson.build) but both my try and your
suggestion break static compilation for the helloworld example.


$ ./devtools/test-meson-builds.sh -vv
...
## Building helloworld
gmake: Entering directory
'/home/dmarchan/builds/build-x86-generic/install/usr/local/share/dpdk/examples/helloworld'
rm -f build/helloworld build/helloworld-static build/helloworld-shared
test -d build && rmdir -p build || true
cc  -I/home/dmarchan/intel-ipsec-mb/install/include -O3
-I/home/dmarchan/builds/build-x86-generic/install/usr/local/include
-include rte_config.h -march=nehalem -I/usr/usr/include
-I/opt/isa-l/include  -DALLOW_EXPERIMENTAL_API main.c -o
build/helloworld-shared  -L/home/dmarchan/intel-ipsec-mb/install/lib
-Wl,--as-needed
-L/home/dmarchan/builds/build-x86-generic/install/usr/local/lib
-lrte_node -lrte_graph -lrte_bpf -lrte_flow_classify -lrte_pipeline
-lrte_table -lrte_port -lrte_fib -lrte_ipsec -lrte_vhost -lrte_stack
-lrte_security -lrte_sched -lrte_reorder -lrte_rib -lrte_regexdev
-lrte_rawdev -lrte_pdump -lrte_power -lrte_member -lrte_lpm
-lrte_latencystats -lrte_kni -lrte_jobstats -lrte_ip_frag -lrte_gso
-lrte_gro -lrte_eventdev -lrte_efd -lrte_distributor -lrte_cryptodev
-lrte_compressdev -lrte_cfgfile -lrte_bitratestats -lrte_bbdev
-lrte_acl -lrte_timer -lrte_hash -lrte_metrics -lrte_cmdline -lrte_pci
-lrte_ethdev -lrte_meter -lrte_net -lrte_mbuf -lrte_mempool -lrte_rcu
-lrte_ring -lrte_eal -lrte_telemetry -lrte_kvargs -lbsd
ln -sf helloworld-shared build/helloworld
cc  -I/home/dmarchan/intel-ipsec-mb/install/include -O3
-I/home/dmarchan/builds/build-x86-generic/install/usr/local/include
-include rte_config.h -march=nehalem -I/usr/usr/include
-I/opt/isa-l/include  -DALLOW_EXPERIMENTAL_API main.c -o
build/helloworld-static  -L/home/dmarchan/intel-ipsec-mb/install/lib
-Wl,--whole-archive
-L/home/dmarchan/builds/build-x86-generic/install/usr/local/lib
-l:librte_common_cpt.a -l:librte_common_dpaax.a
-l:librte_common_iavf.a -l:librte_common_octeontx.a
-l:librte_common_octeontx2.a -l:librte_bus_dpaa.a
-l:librte_bus_fslmc.a -l:librte_bus_ifpga.a -l:librte_bus_pci.a
-l:librte_bus_vdev.a -l:librte_bus_vmbus.a -l:librte_common_cnxk.a
-l:librte_common_mlx5.a -l:librte_common_qat.a
-l:librte_common_sfc_efx.a -l:librte_mempool_bucket.a
-l:librte_mempool_cnxk.a -l:librte_mempool_dpaa.a
-l:librte_mempool_dpaa2.a -l:librte_mempool_octeontx.a
-l:librte_mempool_octeontx2.a -l:librte_mempool_ring.a
-l:librte_mempool_stack.a -l:librte_net_af_packet.a
-l:librte_net_af_xdp.a -l:librte_net_ark.a -l:librte_net_atlantic.a
-l:librte_net_avp.a -l:librte_net_axgbe.a -l:librte_net_bnx2x.a
-l:librte_net_bnxt.a -l:librte_net_bond.a -l:librte_net_cxgbe.a
-l:librte_net_dpaa.a -l:librte_net_dpaa2.a -l:librte_net_e1000.a
-l:librte_net_ena.a -l:librte_net_enetc.a -l:librte_net_enic.a
-l:librte_net_failsafe.a -l:librte_net_fm10k.a -l:librte_net_hinic.a
-l:librte_net_hns3.a -l:librte_net_i40e.a -l:librte_net_iavf.a
-l:librte_net_ice.a -l:librte_net_igc.a -l:librte_net_ionic.a
-l:librte_net_ipn3ke.a -l:librte_net_ixgbe.a -l:librte_net_kni.a
-l:librte_net_liquidio.a -l:librte_net_memif.a -l:librte_net_mlx4.a
-l:librte_net_mlx5.a -l:librte_net_netvsc.a -l:librte_net_nfp.a
-l:librte_net_null.a -l:librte_net_octeontx.a
-l:librte_net_octeontx2.a -l:librte_net_octeontx_ep.a
-l:librte_net_pcap.a -l:librte_net_pfe.a -l:librte_net_qede.a
-l:librte_net_ring.a -l:

[dpdk-dev] [PATCH v1] net/i40e: remove the SMP barrier in HW scanning func

2021-06-04 Thread Joyce Kong
Add the logic to determine how many DD bits have been set
for contiguous packets, for removing the SMP barrier while
reading descs.

Signed-off-by: Joyce Kong 
Reviewed-by: Ruifeng Wang 
---
 drivers/net/i40e/i40e_rxtx.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 6c58decec..410a81f30 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -452,7 +452,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
uint16_t pkt_len;
uint64_t qword1;
uint32_t rx_status;
-   int32_t s[I40E_LOOK_AHEAD], nb_dd;
+   int32_t s[I40E_LOOK_AHEAD], var, nb_dd;
int32_t i, j, nb_rx = 0;
uint64_t pkt_flags;
uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl;
@@ -482,11 +482,14 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
I40E_RXD_QW1_STATUS_SHIFT;
}
 
-   rte_smp_rmb();
-
/* Compute how many status bits were set */
-   for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)
-   nb_dd += s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+   for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) {
+   var = s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+   if (var)
+   nb_dd += 1;
+   else
+   break;
+   }
 
nb_rx += nb_dd;
 
-- 
2.17.1



Re: [dpdk-dev] code.dpdk.org bugfix releases

2021-06-04 Thread Thomas Monjalon
03/06/2021 21:36, Morten Brørup:
> > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > Sent: Thursday, 3 June 2021 21.18
> k-dev] code.dpdk.org bugfix releases
> > 
> > 03/06/2021 20:15, Morten Brørup:
> > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> > Monjalon
> > > > Sent: Thursday, 3 June 2021 18.11
> > > >
> > > > 03/06/2021 17:18, Morten Brørup:
> > > > > Hi all,
> > > > >
> > > > > Bugfix releases are missing on code.dpdk.org.
> > > > >
> > > > > E.g. version 17.11 is present, but version 17.11.10 is missing.
> > > > >
> > > > > Can whoever is the maintainer of code.dpdk.org please fix this.
> > > >
> > > > Ali is the admin of dpdk.org.
> > > > I'm not sure your request is easy because stable releases
> > > > are in a separate repository dpdk-stable.git.
> > >
> > > Ali, it would be great if you can include them.
> > > Not having easy access to browse the source code of the LTS
> > > bugfix releases could be a barrier for DPDK users to upgrade
> > > to the latest LTS bugfix release.
> > 
> > Actually it's already included (thanks Ali for reminding it to me):
> > https://code.dpdk.org/dpdk-stable/v20.11/source
> > 
> > There are 2 different repos and that's how it is exposed in this tool.
> 
> Great... It was just me not knowing where to click. We learn something new 
> every day!
> 
> > It may be a bit confusing.
> > Do you think it would be OK to rename dpdk-stable to dpdk in this tool?
> 
> Having learned that the box saying "dpdk" is a dropdown, I headed over to my 
> favorite Linux source code browser at Bootlin. And they already have what I 
> was asking for: https://elixir.bootlin.com/dpdk/latest/source
> 
> And they call it "dpdk", not "dpdk-stable"... so I support Thomas' suggestion 
> here.

No, what is in bootlin instance is really dpdk, not dpdk-stable.

> Or alternatively: https://code.dpdk.org/ currently redirects to 
> https://code.dpdk.org/dpdk/latest/source. Perhaps it could redirect to 
> https://code.dpdk.org/dpdk-stable/latest/source instead.
> 
> Now, I have another question: What does code.dpdk.org show for "dpdk" that it 
> does not show for "dpdk-stable"?

stable releases are only in dpdk-stable but everything is in dpdk-stable.




Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Thomas Monjalon
04/06/2021 07:51, Wang, Haiyue:
> > From: Elena Agostini 
> > 
> > The new library gpudev is for dealing with GPU from a DPDK application
> > in a vendor-agnostic way.
> > 
> > As a first step, the features are focused on memory management.
> > A function allows to allocate memory inside the GPU,
> > while another one allows to use main (CPU) memory from the GPU.
> > 
> > The infrastructure is prepared to welcome drivers in drivers/gpu/
> > as the upcoming NVIDIA one, implementing the gpudev API.
> > Other additions planned for next revisions:
> >   - C implementation file
> >   - guide documentation
> >   - unit tests
> >   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> > 
> > The next step should focus on GPU processing task control.
> > 
> 
> Is this patch for 'L2FWD-NV Workload on GPU' on P26 ?
> https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9730-packet-processing-on-gpu-at-100gbe-line-rate.pdf

Yes this is the same project: use GPU in DPDK workload.




Re: [dpdk-dev] code.dpdk.org bugfix releases

2021-06-04 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Friday, 4 June 2021 10.12
> 
> 03/06/2021 21:36, Morten Brørup:
> > > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > Sent: Thursday, 3 June 2021 21.18
> > k-dev] code.dpdk.org bugfix releases
> > >
> > > 03/06/2021 20:15, Morten Brørup:
> > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> > > Monjalon
> > > > > Sent: Thursday, 3 June 2021 18.11
> > > > >
> > > > > 03/06/2021 17:18, Morten Brørup:
> > > > > > Hi all,
> > > > > >
> > > > > > Bugfix releases are missing on code.dpdk.org.
> > > > > >
> > > > > > E.g. version 17.11 is present, but version 17.11.10 is
> missing.
> > > > > >
> > > > > > Can whoever is the maintainer of code.dpdk.org please fix
> this.
> > > > >
> > > > > Ali is the admin of dpdk.org.
> > > > > I'm not sure your request is easy because stable releases
> > > > > are in a separate repository dpdk-stable.git.
> > > >
> > > > Ali, it would be great if you can include them.
> > > > Not having easy access to browse the source code of the LTS
> > > > bugfix releases could be a barrier for DPDK users to upgrade
> > > > to the latest LTS bugfix release.
> > >
> > > Actually it's already included (thanks Ali for reminding it to me):
> > >   https://code.dpdk.org/dpdk-stable/v20.11/source
> > >
> > > There are 2 different repos and that's how it is exposed in this
> tool.
> >
> > Great... It was just me not knowing where to click. We learn
> something new every day!
> >
> > > It may be a bit confusing.
> > > Do you think it would be OK to rename dpdk-stable to dpdk in this
> tool?
> >
> > Having learned that the box saying "dpdk" is a dropdown, I headed
> over to my favorite Linux source code browser at Bootlin. And they
> already have what I was asking for:
> https://elixir.bootlin.com/dpdk/latest/source
> >
> > And they call it "dpdk", not "dpdk-stable"... so I support Thomas'
> suggestion here.
> 
> No, what is in bootlin instance is really dpdk, not dpdk-stable.

You are right. I got confused because they also showed the -rc releases.

> 
> > Or alternatively: https://code.dpdk.org/ currently redirects to
> https://code.dpdk.org/dpdk/latest/source. Perhaps it could redirect to
> https://code.dpdk.org/dpdk-stable/latest/source instead.
> >
> > Now, I have another question: What does code.dpdk.org show for "dpdk"
> that it does not show for "dpdk-stable"?
> 
> stable releases are only in dpdk-stable but everything is in dpdk-
> stable.

If you meant "dpdk" (not "dpdk-stable") at the end of that sentence, then 
"dpdk" at code.dpdk.org should show many more tags, as I originally requested.



Re: [dpdk-dev] code.dpdk.org bugfix releases

2021-06-04 Thread Thomas Monjalon
04/06/2021 10:19, Morten Brørup:
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon
> > > Now, I have another question: What does code.dpdk.org show for "dpdk"
> > that it does not show for "dpdk-stable"?
> > 
> > stable releases are only in dpdk-stable but everything is in dpdk-
> > stable.
> 
> If you meant "dpdk" (not "dpdk-stable") at the end of that sentence, then 
> "dpdk" at code.dpdk.org should show many more tags, as I originally requested.

No I meant dpdk-stable :)
In DPDK we have base releases,
in dpdk-stable we have the maintained base releases and the minor stable 
releases.
At the end, we don't have all tags in a single repo.




Re: [dpdk-dev] code.dpdk.org bugfix releases

2021-06-04 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Friday, 4 June 2021 10.48
> 
> 04/06/2021 10:19, Morten Brørup:
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> Monjalon
> > > > Now, I have another question: What does code.dpdk.org show for
> "dpdk"
> > > that it does not show for "dpdk-stable"?
> > >
> > > stable releases are only in dpdk-stable but everything is in dpdk-
> > > stable.
> >
> > If you meant "dpdk" (not "dpdk-stable") at the end of that sentence,
> then "dpdk" at code.dpdk.org should show many more tags, as I
> originally requested.
> 
> No I meant dpdk-stable :)
> In DPDK we have base releases,
> in dpdk-stable we have the maintained base releases and the minor
> stable releases.
> At the end, we don't have all tags in a single repo.

OK. Now I get it!

Then code.dpdk.org probably works as intended. It was just me being confused 
about it, because I'm used to a mono-repo world. "It's a feature, not a bug."

I guess it would be difficult to merge the data from the two repos into one big 
code.dpdk.org source code browser. Which was your first response to my original 
request, Thomas. Ali, what do you think?

-Morten


[dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test

2021-06-04 Thread Joyce Kong
Since C11 memory model is adopted in DPDK now[1], use GCC's 
atomic builtins in test cases.

[1]https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/

Joyce Kong (8):
  test/ticketlock: use GCC atomic builtins for lcores sync
  test/spinlock: use GCC atomic builtins for lcores sync
  test/rwlock: use GCC atomic builtins for lcores sync
  test/mcslock: use GCC atomic builtins for lcores sync
  test/mempool: remove unused variable for lcores sync
  test/mempool_perf: use GCC atomic builtins for lcores sync
  test/service_cores: use GCC atomic builtins for lock sync
  test/rcu_perf: use GCC atomic builtins for data sync

 app/test/test_mcslock.c   | 13 +++--
 app/test/test_mempool.c   |  5 --
 app/test/test_mempool_perf.c  | 12 ++---
 app/test/test_rcu_qsbr_perf.c | 98 +--
 app/test/test_rwlock.c|  9 ++--
 app/test/test_service_cores.c | 36 +++--
 app/test/test_spinlock.c  | 10 ++--
 app/test/test_ticketlock.c|  9 ++--
 8 files changed, 93 insertions(+), 99 deletions(-)

-- 
2.17.1



[dpdk-dev] [PATCH v1 1/8] test/ticketlock: use GCC atomic builtins for lcores sync

2021-06-04 Thread Joyce Kong
Convert rte_atomic usages to GCC atomic builtins for lcores sync
in ticketlock testcases.

Signed-off-by: Joyce Kong 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Honnappa Nagarahalli 
---
 app/test/test_ticketlock.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/app/test/test_ticketlock.c b/app/test/test_ticketlock.c
index 7aab8665b..9aa212fa9 100644
--- a/app/test/test_ticketlock.c
+++ b/app/test/test_ticketlock.c
@@ -9,7 +9,6 @@
 #include 
 #include 
 
-#include 
 #include 
 #include 
 #include 
@@ -49,7 +48,7 @@ static rte_ticketlock_t tl_tab[RTE_MAX_LCORE];
 static rte_ticketlock_recursive_t tlr;
 static unsigned int count;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_ticketlock_per_core(__rte_unused void *arg)
@@ -112,7 +111,7 @@ load_loop_fn(void *func_param)
 
/* wait synchro for workers */
if (lcore != rte_get_main_lcore())
-   while (rte_atomic32_read(&synchro) == 0)
+   while (__atomic_load_n(&synchro, __ATOMIC_RELAXED) == 0)
;
 
begin = rte_rdtsc_precise();
@@ -155,11 +154,11 @@ test_ticketlock_perf(void)
printf("\nTest with lock on %u cores...\n", rte_lcore_count());
 
/* Clear synchro and start workers */
-   rte_atomic32_set(&synchro, 0);
+   __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
/* start synchro and launch test on main */
-   rte_atomic32_set(&synchro, 1);
+   __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
load_loop_fn(&lock);
 
rte_eal_mp_wait_lcore();
-- 
2.17.1



[dpdk-dev] [PATCH v1 2/8] test/spinlock: use GCC atomic builtins for lcores sync

2021-06-04 Thread Joyce Kong
Convert rte_atomic usages to GCC atomic builtins for lcores sync
in spinlock testcases.

Signed-off-by: Joyce Kong 
Reviewed-by: Ruifeng Wang 
---
 app/test/test_spinlock.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/app/test/test_spinlock.c b/app/test/test_spinlock.c
index 054fb43a9..77b9b7086 100644
--- a/app/test/test_spinlock.c
+++ b/app/test/test_spinlock.c
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "test.h"
 
@@ -49,7 +48,7 @@ static rte_spinlock_t sl_tab[RTE_MAX_LCORE];
 static rte_spinlock_recursive_t slr;
 static unsigned count = 0;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_spinlock_per_core(__rte_unused void *arg)
@@ -111,7 +110,8 @@ load_loop_fn(void *func_param)
 
/* wait synchro for workers */
if (lcore != rte_get_main_lcore())
-   while (rte_atomic32_read(&synchro) == 0);
+   while (__atomic_load_n(&synchro, __ATOMIC_RELAXED) == 0)
+   ;
 
begin = rte_get_timer_cycles();
while (lcount < MAX_LOOP) {
@@ -150,11 +150,11 @@ test_spinlock_perf(void)
printf("\nTest with lock on %u cores...\n", rte_lcore_count());
 
/* Clear synchro and start workers */
-   rte_atomic32_set(&synchro, 0);
+   __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
/* start synchro and launch test on main */
-   rte_atomic32_set(&synchro, 1);
+   __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
load_loop_fn(&lock);
 
rte_eal_mp_wait_lcore();
-- 
2.17.1



[dpdk-dev] [PATCH v1 3/8] test/rwlock: use GCC atomic builtins for lcores sync

2021-06-04 Thread Joyce Kong
Convert rte_atomic usages to GCC atomic builtins for lcores sync
in rwlock testcases.

Signed-off-by: Joyce Kong 
Reviewed-by: Ruifeng Wang 
---
 app/test/test_rwlock.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/app/test/test_rwlock.c b/app/test/test_rwlock.c
index b47150a86..ef89ae44c 100644
--- a/app/test/test_rwlock.c
+++ b/app/test/test_rwlock.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -36,7 +35,7 @@
 
 static rte_rwlock_t sl;
 static rte_rwlock_t sl_tab[RTE_MAX_LCORE];
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 enum {
LC_TYPE_RDLOCK,
@@ -102,7 +101,7 @@ load_loop_fn(__rte_unused void *arg)
 
/* wait synchro for workers */
if (lcore != rte_get_main_lcore())
-   while (rte_atomic32_read(&synchro) == 0)
+   while (__atomic_load_n(&synchro, __ATOMIC_RELAXED) == 0)
;
 
begin = rte_rdtsc_precise();
@@ -136,12 +135,12 @@ test_rwlock_perf(void)
printf("\nRwlock Perf Test on %u cores...\n", rte_lcore_count());
 
/* clear synchro and start workers */
-   rte_atomic32_set(&synchro, 0);
+   __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
if (rte_eal_mp_remote_launch(load_loop_fn, NULL, SKIP_MAIN) < 0)
return -1;
 
/* start synchro and launch test on main */
-   rte_atomic32_set(&synchro, 1);
+   __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
load_loop_fn(NULL);
 
rte_eal_mp_wait_lcore();
-- 
2.17.1



[dpdk-dev] [PATCH v1 4/8] test/mcslock: use GCC atomic builtins for lcores sync

2021-06-04 Thread Joyce Kong
Convert rte_atomic usages to GCC atomic builtins for lcores sync
in mcslock testcases.

Signed-off-by: Joyce Kong 
Reviewed-by: Ruifeng Wang 
---
 app/test/test_mcslock.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c
index 80eaecc90..e6bdeb966 100644
--- a/app/test/test_mcslock.c
+++ b/app/test/test_mcslock.c
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "test.h"
 
@@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf;
 
 static unsigned int count;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_mcslock_per_core(__rte_unused void *arg)
@@ -76,7 +75,7 @@ load_loop_fn(void *func_param)
rte_mcslock_t ml_perf_me;
 
/* wait synchro */
-   while (rte_atomic32_read(&synchro) == 0)
+   while (__atomic_load_n(&synchro, __ATOMIC_RELAXED) == 0)
;
 
begin = rte_get_timer_cycles();
@@ -102,15 +101,15 @@ test_mcslock_perf(void)
const unsigned int lcore = rte_lcore_id();
 
printf("\nTest with no lock on single core...\n");
-   rte_atomic32_set(&synchro, 1);
+   __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
load_loop_fn(&lock);
printf("Core [%u] Cost Time = %"PRIu64" us\n",
lcore, time_count[lcore]);
memset(time_count, 0, sizeof(time_count));
 
printf("\nTest with lock on single core...\n");
+   __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
lock = 1;
-   rte_atomic32_set(&synchro, 1);
load_loop_fn(&lock);
printf("Core [%u] Cost Time = %"PRIu64" us\n",
lcore, time_count[lcore]);
@@ -118,11 +117,11 @@ test_mcslock_perf(void)
 
printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
 
-   rte_atomic32_set(&synchro, 0);
+   __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
/* start synchro and launch test on main */
-   rte_atomic32_set(&synchro, 1);
+   __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
load_loop_fn(&lock);
 
rte_eal_mp_wait_lcore();
-- 
2.17.1



[dpdk-dev] [PATCH v1 5/8] test/mempool: remove unused variable for lcores sync

2021-06-04 Thread Joyce Kong
Remove the unused synchro variable as there is no lcores
sync in mempool function test.

Signed-off-by: Joyce Kong 
Reviewed-by: Ruifeng Wang 
---
 app/test/test_mempool.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 3adadd673..7675a3e60 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -20,7 +20,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -57,8 +56,6 @@
goto label; \
} while (0)
 
-static rte_atomic32_t synchro;
-
 /*
  * save the object number in the first 4 bytes of object data. All
  * other bytes are set to 0.
@@ -491,8 +488,6 @@ test_mempool(void)
};
const char *default_pool_ops = rte_mbuf_best_mempool_ops();
 
-   rte_atomic32_init(&synchro);
-
/* create a mempool (without cache) */
mp_nocache = rte_mempool_create("test_nocache", MEMPOOL_SIZE,
MEMPOOL_ELT_SIZE, 0, 0,
-- 
2.17.1



[dpdk-dev] [PATCH v1 6/8] test/mempool_perf: use GCC atomic builtins for lcores sync

2021-06-04 Thread Joyce Kong
Convert rte_atomic usages to GCC atomic builtins for lcores sync
in mempool_perf testcases. Meanwhile, remove unnecessary synchro
init as it would be set to 0 when launching cores.

Signed-off-by: Joyce Kong 
Reviewed-by: Ruifeng Wang 
---
 app/test/test_mempool_perf.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index d7d0aaa33..9271378aa 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -20,7 +20,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -83,7 +82,7 @@
 static int use_external_cache;
 static unsigned external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 /* number of objects in one bulk operation (get or put) */
 static unsigned n_get_bulk;
@@ -145,7 +144,8 @@ per_lcore_mempool_test(void *arg)
 
/* wait synchro for workers */
if (lcore_id != rte_get_main_lcore())
-   while (rte_atomic32_read(&synchro) == 0);
+   while (__atomic_load_n(&synchro, __ATOMIC_RELAXED) == 0)
+   ;
 
start_cycles = rte_get_timer_cycles();
 
@@ -198,7 +198,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
int ret;
unsigned cores_save = cores;
 
-   rte_atomic32_set(&synchro, 0);
+   __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 
/* reset stats */
memset(stats, 0, sizeof(stats));
@@ -223,7 +223,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
}
 
/* start synchro and launch test on main */
-   rte_atomic32_set(&synchro, 1);
+   __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 
ret = per_lcore_mempool_test(mp);
 
@@ -288,8 +288,6 @@ test_mempool_perf(void)
const char *default_pool_ops;
int ret = -1;
 
-   rte_atomic32_init(&synchro);
-
/* create a mempool (without cache) */
mp_nocache = rte_mempool_create("perf_test_nocache", MEMPOOL_SIZE,
MEMPOOL_ELT_SIZE, 0, 0,
-- 
2.17.1



[dpdk-dev] [PATCH v1 7/8] test/service_cores: use GCC atomic builtins for lock sync

2021-06-04 Thread Joyce Kong
Convert rte_atomic usages to GCC atomic builtins for lock sync
in service core testcases.

Signed-off-by: Joyce Kong 
Reviewed-by: Ruifeng Wang 
---
 app/test/test_service_cores.c | 36 +++
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index 37d7172d5..9d908d44e 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -53,18 +53,20 @@ static int32_t dummy_cb(void *args)
 static int32_t dummy_mt_unsafe_cb(void *args)
 {
/* before running test, the initialization has set pass_test to 1.
-* If the cmpset in service-cores is working correctly, the code here
+* If the cas in service-cores is working correctly, the code here
 * should never fail to take the lock. If the lock *is* taken, fail the
 * test, because two threads are concurrently in a non-MT safe callback.
 */
uint32_t *test_params = args;
-   uint32_t *atomic_lock = &test_params[0];
+   uint32_t *lock = &test_params[0];
uint32_t *pass_test = &test_params[1];
-   int lock_taken = rte_atomic32_cmpset(atomic_lock, 0, 1);
+   uint32_t exp = 0;
+   int lock_taken = __atomic_compare_exchange_n(lock, &exp, 1, 0,
+   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
if (lock_taken) {
/* delay with the lock held */
rte_delay_ms(250);
-   rte_atomic32_clear((rte_atomic32_t *)atomic_lock);
+   __atomic_store_n(lock, 0, __ATOMIC_RELAXED);
} else {
/* 2nd thread will fail to take lock, so set pass flag */
*pass_test = 0;
@@ -83,13 +85,15 @@ static int32_t dummy_mt_safe_cb(void *args)
 *that 2 threads are running the callback at the same time: MT safe
 */
uint32_t *test_params = args;
-   uint32_t *atomic_lock = &test_params[0];
+   uint32_t *lock = &test_params[0];
uint32_t *pass_test = &test_params[1];
-   int lock_taken = rte_atomic32_cmpset(atomic_lock, 0, 1);
+   uint32_t exp = 0;
+   int lock_taken = __atomic_compare_exchange_n(lock, &exp, 1, 0,
+   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
if (lock_taken) {
/* delay with the lock held */
rte_delay_ms(250);
-   rte_atomic32_clear((rte_atomic32_t *)atomic_lock);
+   __atomic_store_n(lock, 0, __ATOMIC_RELAXED);
} else {
/* 2nd thread will fail to take lock, so set pass flag */
*pass_test = 1;
@@ -622,9 +626,9 @@ service_threaded_test(int mt_safe)
TEST_ASSERT_EQUAL(0, rte_service_lcore_add(slcore_2),
"mt safe lcore add fail");
 
-   /* Use atomic locks to verify that two threads are in the same function
-* at the same time. These are passed to the unit tests through the
-* callback userdata parameter
+   /* Use locks to verify that two threads are in the same function
+* at the same time. These are passed to the unit tests through
+* the callback userdata parameter.
 */
uint32_t test_params[2];
memset(test_params, 0, sizeof(uint32_t) * 2);
@@ -713,7 +717,7 @@ service_mt_safe_poll(void)
 }
 
 /* tests a NON mt safe service with two cores, the callback is serialized
- * using the atomic cmpset.
+ * using the cas.
  */
 static int
 service_mt_unsafe_poll(void)
@@ -735,17 +739,17 @@ delay_as_a_mt_safe_service(void *args)
RTE_SET_USED(args);
uint32_t *params = args;
 
-   /* retrieve done flag and atomic lock to inc/dec */
+   /* retrieve done flag and lock to add/sub */
uint32_t *done = ¶ms[0];
-   rte_atomic32_t *lock = (rte_atomic32_t *)¶ms[1];
+   uint32_t *lock = ¶ms[1];
 
while (!*done) {
-   rte_atomic32_inc(lock);
+   __atomic_add_fetch(lock, 1, __ATOMIC_RELAXED);
rte_delay_us(500);
-   if (rte_atomic32_read(lock) > 1)
+   if (__atomic_load_n(lock, __ATOMIC_RELAXED) > 1)
/* pass: second core has simultaneously incremented */
*done = 1;
-   rte_atomic32_dec(lock);
+   __atomic_sub_fetch(lock, 1, __ATOMIC_RELAXED);
}
 
return 0;
-- 
2.17.1



[dpdk-dev] [PATCH v1 8/8] test/rcu_perf: use GCC atomic builtins for data sync

2021-06-04 Thread Joyce Kong
Covert rte_atomic usages to GCC atomic builtins in rcu_perf
testcases.

Signed-off-by: Joyce Kong 
Reviewed-by: Ruifeng Wang 
---
 app/test/test_rcu_qsbr_perf.c | 98 +--
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/app/test/test_rcu_qsbr_perf.c b/app/test/test_rcu_qsbr_perf.c
index 3017e7112..cf7b158d2 100644
--- a/app/test/test_rcu_qsbr_perf.c
+++ b/app/test/test_rcu_qsbr_perf.c
@@ -30,8 +30,8 @@ static volatile uint32_t thr_id;
 static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
 static struct rte_hash *h;
 static char hash_name[8];
-static rte_atomic64_t updates, checks;
-static rte_atomic64_t update_cycles, check_cycles;
+static uint64_t updates, checks;
+static uint64_t update_cycles, check_cycles;
 
 /* Scale down results to 1000 operations to support lower
  * granularity clocks.
@@ -81,8 +81,8 @@ test_rcu_qsbr_reader_perf(void *arg)
}
 
cycles = rte_rdtsc_precise() - begin;
-   rte_atomic64_add(&update_cycles, cycles);
-   rte_atomic64_add(&updates, loop_cnt);
+   __atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED);
+   __atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED);
 
/* Make the thread offline */
rte_rcu_qsbr_thread_offline(t[0], thread_id);
@@ -113,8 +113,8 @@ test_rcu_qsbr_writer_perf(void *arg)
} while (loop_cnt < 2000);
 
cycles = rte_rdtsc_precise() - begin;
-   rte_atomic64_add(&check_cycles, cycles);
-   rte_atomic64_add(&checks, loop_cnt);
+   __atomic_fetch_add(&check_cycles, cycles, __ATOMIC_RELAXED);
+   __atomic_fetch_add(&checks, loop_cnt, __ATOMIC_RELAXED);
return 0;
 }
 
@@ -130,10 +130,10 @@ test_rcu_qsbr_perf(void)
 
writer_done = 0;
 
-   rte_atomic64_clear(&updates);
-   rte_atomic64_clear(&update_cycles);
-   rte_atomic64_clear(&checks);
-   rte_atomic64_clear(&check_cycles);
+   __atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+   __atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
+   __atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+   __atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
printf("\nPerf Test: %d Readers/1 Writer('wait' in qsbr_check == 
true)\n",
num_cores - 1);
@@ -168,15 +168,15 @@ test_rcu_qsbr_perf(void)
rte_eal_mp_wait_lcore();
 
printf("Total quiescent state updates = %"PRIi64"\n",
-   rte_atomic64_read(&updates));
+   __atomic_load_n(&updates, __ATOMIC_RELAXED));
printf("Cycles per %d quiescent state updates: %"PRIi64"\n",
RCU_SCALE_DOWN,
-   rte_atomic64_read(&update_cycles) /
-   (rte_atomic64_read(&updates) / RCU_SCALE_DOWN));
-   printf("Total RCU checks = %"PRIi64"\n", rte_atomic64_read(&checks));
+   __atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+   (__atomic_load_n(&updates, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
+   printf("Total RCU checks = %"PRIi64"\n", __atomic_load_n(&checks, 
__ATOMIC_RELAXED));
printf("Cycles per %d checks: %"PRIi64"\n", RCU_SCALE_DOWN,
-   rte_atomic64_read(&check_cycles) /
-   (rte_atomic64_read(&checks) / RCU_SCALE_DOWN));
+   __atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+   (__atomic_load_n(&checks, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
 
rte_free(t[0]);
 
@@ -193,8 +193,8 @@ test_rcu_qsbr_rperf(void)
size_t sz;
unsigned int i, tmp_num_cores;
 
-   rte_atomic64_clear(&updates);
-   rte_atomic64_clear(&update_cycles);
+   __atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+   __atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
 
__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
 
@@ -220,11 +220,11 @@ test_rcu_qsbr_rperf(void)
rte_eal_mp_wait_lcore();
 
printf("Total quiescent state updates = %"PRIi64"\n",
-   rte_atomic64_read(&updates));
+   __atomic_load_n(&updates, __ATOMIC_RELAXED));
printf("Cycles per %d quiescent state updates: %"PRIi64"\n",
RCU_SCALE_DOWN,
-   rte_atomic64_read(&update_cycles) /
-   (rte_atomic64_read(&updates) / RCU_SCALE_DOWN));
+   __atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+   (__atomic_load_n(&updates, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
 
rte_free(t[0]);
 
@@ -241,8 +241,8 @@ test_rcu_qsbr_wperf(void)
size_t sz;
unsigned int i;
 
-   rte_atomic64_clear(&checks);
-   rte_atomic64_clear(&check_cycles);
+   __atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+   __atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
 
@@ -266,10 +266,10 @@ test_rcu_qsbr_wperf(void)
/* Wait until all readers have exited */
rte_eal_mp_wait_lcore();
 
-   printf("Total RCU checks = %"PRIi64"\n", rte_atomic64_read(&checks));
+

[dpdk-dev] [PATCH] doc: add missing update for recently added features

2021-06-04 Thread Ivan Malov
Actions VXLAN_DECAP and VXLAN_ENCAP need to be listed
among actions supported for transfer flows.

Fixes: 6ab6c40d1e83 ("net/sfc: support action VXLAN decap in transfer rules")
Fixes: 1bbd1ec2348a ("net/sfc: support action VXLAN encap in MAE backend")
Cc: sta...@dpdk.org

Signed-off-by: Ivan Malov 
Reviewed-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 doc/guides/nics/sfc_efx.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst
index cf1269cc0..df16fff32 100644
--- a/doc/guides/nics/sfc_efx.rst
+++ b/doc/guides/nics/sfc_efx.rst
@@ -228,6 +228,10 @@ Supported actions (***transfer*** rules):
 
 - OF_VLAN_SET_PCP
 
+- VXLAN_DECAP
+
+- VXLAN_ENCAP
+
 - FLAG
 
 - MARK
-- 
2.20.1



Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Thomas Monjalon
03/06/2021 11:33, Ferruh Yigit:
> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon  wrote:
> >> +  [gpudev] (@ref rte_gpudev.h),
> > 
> > Since this device does not have a queue etc? Shouldn't make it a
> > library like mempool with vendor-defined ops?
> 
> +1
> 
> Current RFC announces additional memory allocation capabilities, which can 
> suits
> better as extension to existing memory related library instead of a new device
> abstraction library.

It is not replacing mempool.
It is more at the same level as EAL memory management:
allocate simple buffer, but with the exception it is done
on a specific device, so it requires a device ID.

The other reason it needs to be a full library is that
it will start a workload on the GPU and get completion notification
so we can integrate the GPU workload in a packet processing pipeline.




Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Wang, Haiyue
> -Original Message-
> From: dev  On Behalf Of Thomas Monjalon
> Sent: Thursday, June 3, 2021 04:36
> To: dev@dpdk.org
> Cc: Elena Agostini 
> Subject: [dpdk-dev] [PATCH] gpudev: introduce memory API
> 
> From: Elena Agostini 
> 
> The new library gpudev is for dealing with GPU from a DPDK application
> in a vendor-agnostic way.
> 
> As a first step, the features are focused on memory management.
> A function allows to allocate memory inside the GPU,
> while another one allows to use main (CPU) memory from the GPU.
> 
> The infrastructure is prepared to welcome drivers in drivers/gpu/
> as the upcoming NVIDIA one, implementing the gpudev API.
> Other additions planned for next revisions:
>   - C implementation file
>   - guide documentation
>   - unit tests
>   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> 
> The next step should focus on GPU processing task control.
> 
> Signed-off-by: Elena Agostini 
> Signed-off-by: Thomas Monjalon 
> ---
>  .gitignore   |   1 +
>  MAINTAINERS  |   6 +
>  doc/api/doxy-api-index.md|   1 +
>  doc/api/doxy-api.conf.in |   1 +
>  doc/guides/conf.py   |   8 ++
>  doc/guides/gpus/features/default.ini |  13 ++
>  doc/guides/gpus/index.rst|  11 ++
>  doc/guides/gpus/overview.rst |   7 +
>  doc/guides/index.rst |   1 +
>  doc/guides/prog_guide/gpu.rst|   5 +
>  doc/guides/prog_guide/index.rst  |   1 +
>  drivers/gpu/meson.build  |   4 +
>  drivers/meson.build  |   1 +
>  lib/gpudev/gpu_driver.h  |  44 +++
>  lib/gpudev/meson.build   |   9 ++
>  lib/gpudev/rte_gpudev.h  | 183 +++
>  lib/gpudev/version.map   |  11 ++
>  lib/meson.build  |   1 +
>  18 files changed, 308 insertions(+)
>  create mode 100644 doc/guides/gpus/features/default.ini
>  create mode 100644 doc/guides/gpus/index.rst
>  create mode 100644 doc/guides/gpus/overview.rst
>  create mode 100644 doc/guides/prog_guide/gpu.rst
>  create mode 100644 drivers/gpu/meson.build
>  create mode 100644 lib/gpudev/gpu_driver.h
>  create mode 100644 lib/gpudev/meson.build
>  create mode 100644 lib/gpudev/rte_gpudev.h
>  create mode 100644 lib/gpudev/version.map
> 


> +#include 
> +
> +#include 
> +
> +#include "rte_gpudev.h"
> +
> +struct rte_gpu_dev;
> +
> +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void 
> **ptr);
> +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
> +
> +struct rte_gpu_dev {
> + /* Backing device. */
> + struct rte_device *device;
> + /* GPU info structure. */
> + struct rte_gpu_info info;
> + /* Counter of processes using the device. */
> + uint16_t process_cnt;
> + /* If device is currently used or not. */
> + enum rte_gpu_state state;
> + /* FUNCTION: Allocate memory on the GPU. */
> + gpu_malloc_t gpu_malloc;
> + /* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> + gpu_malloc_t gpu_malloc_visible;
> + /* FUNCTION: Free allocated memory on the GPU. */
> + gpu_free_t gpu_free;


I'm wondering that we can define the malloc type as:

typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr,
unsigned int flags)

#define RTE_GPU_MALLOC_F_CPU_VISIBLE 0x01u --> gpu_malloc_visible

Then only one malloc function member is needed, paired with 'gpu_free'.

> + /* Device interrupt handle. */
> + struct rte_intr_handle *intr_handle;
> + /* Driver-specific private data. */
> + void *dev_private;
> +} __rte_cache_aligned;
> +


> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate a chunk of memory on the GPU.
> + *
> + * @param gpu_id
> + *   GPU ID to allocate memory.
> + * @param size
> + *   Number of bytes to allocate.
> + * @param ptr
> + *   Pointer to store the address of the allocated memory.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate a chunk of memory on the CPU that is visible from the GPU.
> + *
> + * @param gpu_id
> + *   Reference GPU ID.
> + * @param size
> + *   Number of bytes to allocate.
> + * @param ptr
> + *   Pointer to store the address of the allocated memory.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +int rte_gpu_malloc_visible(uint16_t gpu_id, size_t size, void **ptr);

Then 'rte_gpu_malloc_visible' is no needed, and the new call is:

rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr, 
RTE_GPU_MALLOC_F_CPU_VISIBLE).

Also, we can define more flags for feature extension. ;-)

> +
> +#ifdef __cplusplus
> +}
> --
> 2.31

Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Jerin Jacob
On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon  wrote:
>
> 03/06/2021 11:33, Ferruh Yigit:
> > On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon  
> > > wrote:
> > >> +  [gpudev] (@ref rte_gpudev.h),
> > >
> > > Since this device does not have a queue etc? Shouldn't make it a
> > > library like mempool with vendor-defined ops?
> >
> > +1
> >
> > Current RFC announces additional memory allocation capabilities, which can 
> > suits
> > better as extension to existing memory related library instead of a new 
> > device
> > abstraction library.
>
> It is not replacing mempool.
> It is more at the same level as EAL memory management:
> allocate simple buffer, but with the exception it is done
> on a specific device, so it requires a device ID.
>
> The other reason it needs to be a full library is that
> it will start a workload on the GPU and get completion notification
> so we can integrate the GPU workload in a packet processing pipeline.

I might have confused you. My intention is not to make to fit under mempool API.

I agree that we need a separate library for this. My objection is only
to not call libgpudev and
call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
it not like existing "device libraries" in DPDK and
it like other "libraries" in DPDK.



>
>


Re: [dpdk-dev] [PATCH] vhost: allocate and free packets in bulk in Tx split

2021-06-04 Thread Maxime Coquelin
Hi Balazs,

On 5/28/21 12:26 PM, Balazs Nemeth wrote:
> Same idea as commit a287ac28919d ("vhost: allocate and free packets
> in bulk in Tx packed"), allocate and free packets in bulk. Also remove
> the unused function virtio_dev_pktmbuf_alloc.
> 
> Signed-off-by: Balazs Nemeth 
> ---
>  lib/vhost/virtio_net.c | 37 -
>  1 file changed, 8 insertions(+), 29 deletions(-)
> 

Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



[dpdk-dev] [PATCH] eal: add include for rte_byteorder on ARM

2021-06-04 Thread Michael Pfeiffer
Including rte_byteorder.h may fail for ARM builds with 'Platform must
be built with RTE_FORCE_INTRINSICS' if rte_config.h is not included
before. Include rte_config.h from rte_byteorder.h to solve the issue.

Signed-off-by: Michael Pfeiffer 
---
 lib/eal/arm/include/rte_byteorder.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/eal/arm/include/rte_byteorder.h 
b/lib/eal/arm/include/rte_byteorder.h
index df2f1d87ba..1f90db9943 100644
--- a/lib/eal/arm/include/rte_byteorder.h
+++ b/lib/eal/arm/include/rte_byteorder.h
@@ -5,18 +5,19 @@
 #ifndef _RTE_BYTEORDER_ARM_H_
 #define _RTE_BYTEORDER_ARM_H_
 
-#ifndef RTE_FORCE_INTRINSICS
-#  error Platform must be built with RTE_FORCE_INTRINSICS
-#endif
-
 #ifdef __cplusplus
 extern "C" {
 #endif
 
 #include 
 #include 
+#include 
 #include "generic/rte_byteorder.h"
 
+#ifndef RTE_FORCE_INTRINSICS
+#  error Platform must be built with RTE_FORCE_INTRINSICS
+#endif
+
 /* fix missing __builtin_bswap16 for gcc older then 4.8 */
 #if !(__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
 
-- 
2.31.1



[dpdk-dev] [PATCH v2] vhost: allocate and free packets in bulk in Tx split

2021-06-04 Thread Balazs Nemeth
Same idea as commit a287ac28919d ("vhost: allocate and free packets
in bulk in Tx packed"), allocate and free packets in bulk. Also remove
the unused function virtio_dev_pktmbuf_alloc.

Signed-off-by: Balazs Nemeth 
---
 lib/vhost/virtio_net.c | 37 -
 1 file changed, 8 insertions(+), 29 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 8da8a86a10..fa387b5ff4 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2670,32 +2670,6 @@ virtio_dev_pktmbuf_prep(struct virtio_net *dev, struct 
rte_mbuf *pkt,
return -1;
 }
 
-/*
- * Allocate a host supported pktmbuf.
- */
-static __rte_always_inline struct rte_mbuf *
-virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp,
-uint32_t data_len)
-{
-   struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp);
-
-   if (unlikely(pkt == NULL)) {
-   VHOST_LOG_DATA(ERR,
-   "Failed to allocate memory for mbuf.\n");
-   return NULL;
-   }
-
-   if (virtio_dev_pktmbuf_prep(dev, pkt, data_len)) {
-   /* Data doesn't fit into the buffer and the host supports
-* only linear buffers
-*/
-   rte_pktmbuf_free(pkt);
-   return NULL;
-   }
-
-   return pkt;
-}
-
 __rte_always_inline
 static uint16_t
 virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
@@ -2725,6 +2699,9 @@ virtio_dev_tx_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
VHOST_LOG_DATA(DEBUG, "(%d) about to dequeue %u buffers\n",
dev->vid, count);
 
+   if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts, count))
+   return 0;
+
for (i = 0; i < count; i++) {
struct buf_vector buf_vec[BUF_VECTOR_MAX];
uint16_t head_idx;
@@ -2741,8 +2718,8 @@ virtio_dev_tx_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
 
update_shadow_used_ring_split(vq, head_idx, 0);
 
-   pkts[i] = virtio_dev_pktmbuf_alloc(dev, mbuf_pool, buf_len);
-   if (unlikely(pkts[i] == NULL)) {
+   err = virtio_dev_pktmbuf_prep(dev, pkts[i], buf_len);
+   if (unlikely(err)) {
/*
 * mbuf allocation fails for jumbo packets when external
 * buffer allocation is not allowed and linear buffer
@@ -2762,7 +2739,6 @@ virtio_dev_tx_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
mbuf_pool, legacy_ol_flags);
if (unlikely(err)) {
-   rte_pktmbuf_free(pkts[i]);
if (!allocerr_warned) {
VHOST_LOG_DATA(ERR,
"Failed to copy desc to mbuf on %s.\n",
@@ -2775,6 +2751,9 @@ virtio_dev_tx_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
}
}
 
+   if (dropped)
+   rte_pktmbuf_free_bulk(&pkts[i - 1], count - i + 1);
+
vq->last_avail_idx += i;
 
do_data_copy_dequeue(vq);
-- 
2.31.1



Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Thomas Monjalon
04/06/2021 13:07, Wang, Haiyue:
> > From: Elena Agostini 
> > +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void 
> > **ptr);
> > +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
> > +
[...]
> > +   /* FUNCTION: Allocate memory on the GPU. */
> > +   gpu_malloc_t gpu_malloc;
> > +   /* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> > +   gpu_malloc_t gpu_malloc_visible;
> > +   /* FUNCTION: Free allocated memory on the GPU. */
> > +   gpu_free_t gpu_free;
> 
> 
> I'm wondering that we can define the malloc type as:
> 
> typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr,
>   unsigned int flags)
> 
> #define RTE_GPU_MALLOC_F_CPU_VISIBLE 0x01u --> gpu_malloc_visible
> 
> Then only one malloc function member is needed, paired with 'gpu_free'.
[...]
> > +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
[...]
> > +int rte_gpu_malloc_visible(uint16_t gpu_id, size_t size, void **ptr);
> 
> Then 'rte_gpu_malloc_visible' is no needed, and the new call is:
> 
> rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr, 
> RTE_GPU_MALLOC_F_CPU_VISIBLE).
> 
> Also, we can define more flags for feature extension. ;-)

Yes it is a good idea.

Another question is about the function rte_gpu_free().
How do we recognize that a memory chunk is from the CPU and GPU visible,
or just from GPU?




Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Thomas Monjalon
04/06/2021 13:09, Jerin Jacob:
> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon  wrote:
> > 03/06/2021 11:33, Ferruh Yigit:
> > > On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon  
> > > > wrote:
> > > >> +  [gpudev] (@ref rte_gpudev.h),
> > > >
> > > > Since this device does not have a queue etc? Shouldn't make it a
> > > > library like mempool with vendor-defined ops?
> > >
> > > +1
> > >
> > > Current RFC announces additional memory allocation capabilities, which 
> > > can suits
> > > better as extension to existing memory related library instead of a new 
> > > device
> > > abstraction library.
> >
> > It is not replacing mempool.
> > It is more at the same level as EAL memory management:
> > allocate simple buffer, but with the exception it is done
> > on a specific device, so it requires a device ID.
> >
> > The other reason it needs to be a full library is that
> > it will start a workload on the GPU and get completion notification
> > so we can integrate the GPU workload in a packet processing pipeline.
> 
> I might have confused you. My intention is not to make to fit under mempool 
> API.
> 
> I agree that we need a separate library for this. My objection is only
> to not call libgpudev and
> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
> it not like existing "device libraries" in DPDK and
> it like other "libraries" in DPDK.

I think we should define a queue of processing actions,
so it looks like other device libraries.
And anyway I think a library managing a device class,
and having some device drivers deserves the name of device library.

I would like to read more opinions.




Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Thomas Monjalon
03/06/2021 13:38, Jerin Jacob:
> On Thu, Jun 3, 2021 at 4:00 PM Thomas Monjalon  wrote:
> > 03/06/2021 12:04, Jerin Jacob:
> > > On Thu, Jun 3, 2021 at 3:06 PM Thomas Monjalon  
> > > wrote:
> > > > 03/06/2021 11:20, Jerin Jacob:
> > > > > The device needs have a queue kind of structure
> > > > > and it is mapping to core to have a notion of configure. queue_setup,
> > > > > start and stop etc
> > > >
> > > > Why is it a requirement to call it a device API?
> > >
> > > Then we need to define what needs to call as device library vs library 
> > > and how?
> > > Why mempool is not called a  device library vs library?
> >
> > My view is simple:
> > if it has drivers, it is a device API, except bus and mempool libs.
> 
> rte_secuity has drivers but it is not called a device library.

rte_security is a monster beast :)
Yes it has rte_security_ops implemented in net and crypto drivers,
but it is an API extension only, there is no driver dedicated to security.

> > About mempool, it started as a standard lib and got extended for HW support.
> 
> Yes. We did not change to device library as it was fundamentally
> different than another DPDK deices
> when we added the device support.
> 
> > > and why all
> > > other device library has a common structure like queues and
> > > it binding core etc. I tried to explain above the similar attributes
> > > for dpdk device libraries[1] which I think, it a requirement so
> > > that the end user will have familiarity with device libraries rather
> > > than each one has separate General guidelines and principles.
> > >
> > > I think, it is more TB discussion topic and decides on this because I
> > > don't see in technical issue in calling it a library.
> >
> > The naming is just a choice.
> 
> Not sure.
> 
> > Yesterday morning it was called lib/gpu/
> > and in the evening it was renamed lib/gpudev/
> > so no technical issue :)
> >
> > But the design of the API with queues or other paradigm
> > is something I would like to discuss here.
> 
> Yeah, That is important. IMO, That defines what needs to be a device library.
> 
> > Note: there was no intent to publish GPU processing control
> > in DPDK 21.08. We want to focus on GPU memory in 21.08,
> > but I understand it is a key decision in the big picture.
> 
> if the scope is only memory allocation, IMO, it is better to make a library.

No it is only the first step.

> > What would be your need and would you design such API?
> 
> For me, there is no need for gpu library(as of now). May GPU consumers
> can define what they need to control using the library.

We need to integrate GPU processing workload in the DPDK workflow
as a generic API.
There could be 2 modes:
- queue of tasks
- tasks in an infinite loop
In both modes, we could get completion notifications
with an interrupt/callback or by polling a shared memory.





Re: [dpdk-dev] [PATCH] kni: fix compilation on SLES15-SP3

2021-06-04 Thread Luca Boccassi
On Wed, 2021-06-02 at 16:33 +0200, Christian Ehrhardt wrote:
> Like what was done for mainline kernel in commit 38ad54f3bc76 ("kni: fix
> build with Linux 5.6"), a new parameter 'txqueue' has to be added to
> 'ndo_tx_timeout' ndo on SLES 15-SP3 kernel.
> 
> Caused by:
>   commit c3bf155c40e9db722feb8a08c19efd44c12d5294
>   Author: Thomas Bogendoerfer 
>   Date:   Fri Sep 11 16:08:31 2020 +0200
>   - netdev: pass the stuck queue to the timeout handler
> (jsc#SLE-13536).
>   - Refresh patches.suse/sfc-move-various-functions.patch.
> 
> That is part of the SLES 5.3.18 kernel and therefore the
> version we check for.
> 
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Christian Ehrhardt 
> ---
>  kernel/linux/kni/compat.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/linux/kni/compat.h b/kernel/linux/kni/compat.h
> index 5f65640d5ed..70e014fd1da 100644
> --- a/kernel/linux/kni/compat.h
> +++ b/kernel/linux/kni/compat.h
> @@ -133,7 +133,9 @@
>  
>  #if KERNEL_VERSION(5, 6, 0) <= LINUX_VERSION_CODE || \
>   (defined(RHEL_RELEASE_CODE) && \
> -  RHEL_RELEASE_VERSION(8, 3) <= RHEL_RELEASE_CODE)
> +  RHEL_RELEASE_VERSION(8, 3) <= RHEL_RELEASE_CODE) || \
> + (defined(CONFIG_SUSE_KERNEL) && \
> +  KERNEL_VERSION(5, 3, 18) <= LINUX_VERSION_CODE)
>  #define HAVE_TX_TIMEOUT_TXQUEUE
>  #endif
> 

Acked-by: Luca Boccassi 

-- 
Kind regards,
Luca Boccassi


Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Andrew Rybchenko
On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> 04/06/2021 13:09, Jerin Jacob:
>> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon  wrote:
>>> 03/06/2021 11:33, Ferruh Yigit:
 On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon  
> wrote:
>> +  [gpudev] (@ref rte_gpudev.h),
>
> Since this device does not have a queue etc? Shouldn't make it a
> library like mempool with vendor-defined ops?

 +1

 Current RFC announces additional memory allocation capabilities, which can 
 suits
 better as extension to existing memory related library instead of a new 
 device
 abstraction library.
>>>
>>> It is not replacing mempool.
>>> It is more at the same level as EAL memory management:
>>> allocate simple buffer, but with the exception it is done
>>> on a specific device, so it requires a device ID.
>>>
>>> The other reason it needs to be a full library is that
>>> it will start a workload on the GPU and get completion notification
>>> so we can integrate the GPU workload in a packet processing pipeline.
>>
>> I might have confused you. My intention is not to make to fit under mempool 
>> API.
>>
>> I agree that we need a separate library for this. My objection is only
>> to not call libgpudev and
>> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
>> it not like existing "device libraries" in DPDK and
>> it like other "libraries" in DPDK.
> 
> I think we should define a queue of processing actions,
> so it looks like other device libraries.
> And anyway I think a library managing a device class,
> and having some device drivers deserves the name of device library.
> 
> I would like to read more opinions.

Since the library is an unified interface to GPU device drivers
I think it should be named as in the patch - gpudev.

Mempool looks like an exception here - initially it was pure SW
library, but not there are HW backends and corresponding device
drivers.

What I don't understand where is GPU specifics here?
I.e. why GPU? NIC can have own memory and provide
corresponding API.

What's the difference of "the memory on the CPU that is visible from the
GPU" from existing memzones which are DMA mapped?


Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Thomas Monjalon
04/06/2021 15:05, Andrew Rybchenko:
> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > 04/06/2021 13:09, Jerin Jacob:
> >> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon  wrote:
> >>> 03/06/2021 11:33, Ferruh Yigit:
>  On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon  
> > wrote:
> >> +  [gpudev] (@ref rte_gpudev.h),
> >
> > Since this device does not have a queue etc? Shouldn't make it a
> > library like mempool with vendor-defined ops?
> 
>  +1
> 
>  Current RFC announces additional memory allocation capabilities, which 
>  can suits
>  better as extension to existing memory related library instead of a new 
>  device
>  abstraction library.
> >>>
> >>> It is not replacing mempool.
> >>> It is more at the same level as EAL memory management:
> >>> allocate simple buffer, but with the exception it is done
> >>> on a specific device, so it requires a device ID.
> >>>
> >>> The other reason it needs to be a full library is that
> >>> it will start a workload on the GPU and get completion notification
> >>> so we can integrate the GPU workload in a packet processing pipeline.
> >>
> >> I might have confused you. My intention is not to make to fit under 
> >> mempool API.
> >>
> >> I agree that we need a separate library for this. My objection is only
> >> to not call libgpudev and
> >> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
> >> it not like existing "device libraries" in DPDK and
> >> it like other "libraries" in DPDK.
> > 
> > I think we should define a queue of processing actions,
> > so it looks like other device libraries.
> > And anyway I think a library managing a device class,
> > and having some device drivers deserves the name of device library.
> > 
> > I would like to read more opinions.
> 
> Since the library is an unified interface to GPU device drivers
> I think it should be named as in the patch - gpudev.
> 
> Mempool looks like an exception here - initially it was pure SW
> library, but not there are HW backends and corresponding device
> drivers.
> 
> What I don't understand where is GPU specifics here?

That's an interesting question.
Let's ask first what is a GPU for DPDK?
I think it is like a sub-CPU with high parallel execution capabilities,
and it is controlled by the CPU.

> I.e. why GPU? NIC can have own memory and provide corresponding API.

So far we don't need to explicitly allocate memory on the NIC.
The packets are received or copied to the CPU memory.
In the GPU case, the NIC could save the packets directly
in the GPU memory, thus the need to manage the GPU memory.

Also, because the GPU program is dynamically loaded,
there is no fixed API to interact with the GPU workload except via memory.

> What's the difference of "the memory on the CPU that is visible from the
> GPU" from existing memzones which are DMA mapped?

The only difference is that the GPU must map the CPU memory
in its program logic.




Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Wang, Haiyue
> -Original Message-
> From: Thomas Monjalon 
> Sent: Friday, June 4, 2021 20:44
> To: Wang, Haiyue 
> Cc: dev@dpdk.org; Elena Agostini 
> Subject: Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
> 
> 04/06/2021 13:07, Wang, Haiyue:
> > > From: Elena Agostini 
> > > +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void 
> > > **ptr);
> > > +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
> > > +
> [...]
> > > + /* FUNCTION: Allocate memory on the GPU. */
> > > + gpu_malloc_t gpu_malloc;
> > > + /* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> > > + gpu_malloc_t gpu_malloc_visible;
> > > + /* FUNCTION: Free allocated memory on the GPU. */
> > > + gpu_free_t gpu_free;
> >
> >
> > I'm wondering that we can define the malloc type as:
> >
> > typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void 
> > **ptr,
> > unsigned int flags)
> >
> > #define RTE_GPU_MALLOC_F_CPU_VISIBLE 0x01u --> gpu_malloc_visible
> >
> > Then only one malloc function member is needed, paired with 'gpu_free'.
> [...]
> > > +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
> [...]
> > > +int rte_gpu_malloc_visible(uint16_t gpu_id, size_t size, void **ptr);
> >
> > Then 'rte_gpu_malloc_visible' is no needed, and the new call is:
> >
> > rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr, 
> > RTE_GPU_MALLOC_F_CPU_VISIBLE).
> >
> > Also, we can define more flags for feature extension. ;-)
> 
> Yes it is a good idea.
> 
> Another question is about the function rte_gpu_free().
> How do we recognize that a memory chunk is from the CPU and GPU visible,
> or just from GPU?
> 

I didn't find the rte_gpu_free_visible definition, and the rte_gpu_free's
comment just says: deallocate a chunk of memory allocated with rte_gpu_malloc*

Looks like the rte_gpu_free can handle this case ?

And from the definition "rte_gpu_free(uint16_t gpu_id, void *ptr)", the
free needs to check whether this memory belong to the GPU or not, so it
also can recognize the memory type, I think.


Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Andrew Rybchenko
On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> 04/06/2021 15:05, Andrew Rybchenko:
>> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
>>> 04/06/2021 13:09, Jerin Jacob:
 On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon  wrote:
> 03/06/2021 11:33, Ferruh Yigit:
>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon  
>>> wrote:
 +  [gpudev] (@ref rte_gpudev.h),
>>>
>>> Since this device does not have a queue etc? Shouldn't make it a
>>> library like mempool with vendor-defined ops?
>>
>> +1
>>
>> Current RFC announces additional memory allocation capabilities, which 
>> can suits
>> better as extension to existing memory related library instead of a new 
>> device
>> abstraction library.
>
> It is not replacing mempool.
> It is more at the same level as EAL memory management:
> allocate simple buffer, but with the exception it is done
> on a specific device, so it requires a device ID.
>
> The other reason it needs to be a full library is that
> it will start a workload on the GPU and get completion notification
> so we can integrate the GPU workload in a packet processing pipeline.

 I might have confused you. My intention is not to make to fit under 
 mempool API.

 I agree that we need a separate library for this. My objection is only
 to not call libgpudev and
 call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
 it not like existing "device libraries" in DPDK and
 it like other "libraries" in DPDK.
>>>
>>> I think we should define a queue of processing actions,
>>> so it looks like other device libraries.
>>> And anyway I think a library managing a device class,
>>> and having some device drivers deserves the name of device library.
>>>
>>> I would like to read more opinions.
>>
>> Since the library is an unified interface to GPU device drivers
>> I think it should be named as in the patch - gpudev.
>>
>> Mempool looks like an exception here - initially it was pure SW
>> library, but not there are HW backends and corresponding device
>> drivers.
>>
>> What I don't understand where is GPU specifics here?
> 
> That's an interesting question.
> Let's ask first what is a GPU for DPDK?
> I think it is like a sub-CPU with high parallel execution capabilities,
> and it is controlled by the CPU.

I have no good ideas how to name it in accordance with
above description to avoid "G" which for "Graphics" if
understand correctly. However, may be it is not required.
No strong opinion on the topic, but unbinding from
"Graphics" would be nice.

>> I.e. why GPU? NIC can have own memory and provide corresponding API.
> 
> So far we don't need to explicitly allocate memory on the NIC.
> The packets are received or copied to the CPU memory.
> In the GPU case, the NIC could save the packets directly
> in the GPU memory, thus the need to manage the GPU memory.
> 
> Also, because the GPU program is dynamically loaded,
> there is no fixed API to interact with the GPU workload except via memory.
> 
>> What's the difference of "the memory on the CPU that is visible from the
>> GPU" from existing memzones which are DMA mapped?
> 
> The only difference is that the GPU must map the CPU memory
> in its program logic.

I see. Thanks for the explanations.


Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Thomas Monjalon
04/06/2021 15:25, Wang, Haiyue:
> From: Thomas Monjalon 
> > Another question is about the function rte_gpu_free().
> > How do we recognize that a memory chunk is from the CPU and GPU visible,
> > or just from GPU?
> > 
> 
> I didn't find the rte_gpu_free_visible definition, and the rte_gpu_free's
> comment just says: deallocate a chunk of memory allocated with rte_gpu_malloc*
> 
> Looks like the rte_gpu_free can handle this case ?

This is the proposal, yes.

> And from the definition "rte_gpu_free(uint16_t gpu_id, void *ptr)", the
> free needs to check whether this memory belong to the GPU or not, so it
> also can recognize the memory type, I think.

Yes that's the idea behind having a single free function.
We could have some metadata in front of the memory chunk.
My question is to confirm whether it is a good design or not,
and whether it should be driver specific or have a common struct in the lib.

Opinions?




Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Thomas Monjalon
04/06/2021 15:59, Andrew Rybchenko:
> On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > 04/06/2021 15:05, Andrew Rybchenko:
> >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> >>> 04/06/2021 13:09, Jerin Jacob:
>  On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon  
>  wrote:
> > 03/06/2021 11:33, Ferruh Yigit:
> >> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> >>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon  
> >>> wrote:
>  +  [gpudev] (@ref rte_gpudev.h),
> >>>
> >>> Since this device does not have a queue etc? Shouldn't make it a
> >>> library like mempool with vendor-defined ops?
> >>
> >> +1
> >>
> >> Current RFC announces additional memory allocation capabilities, which 
> >> can suits
> >> better as extension to existing memory related library instead of a 
> >> new device
> >> abstraction library.
> >
> > It is not replacing mempool.
> > It is more at the same level as EAL memory management:
> > allocate simple buffer, but with the exception it is done
> > on a specific device, so it requires a device ID.
> >
> > The other reason it needs to be a full library is that
> > it will start a workload on the GPU and get completion notification
> > so we can integrate the GPU workload in a packet processing pipeline.
> 
>  I might have confused you. My intention is not to make to fit under 
>  mempool API.
> 
>  I agree that we need a separate library for this. My objection is only
>  to not call libgpudev and
>  call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
>  it not like existing "device libraries" in DPDK and
>  it like other "libraries" in DPDK.
> >>>
> >>> I think we should define a queue of processing actions,
> >>> so it looks like other device libraries.
> >>> And anyway I think a library managing a device class,
> >>> and having some device drivers deserves the name of device library.
> >>>
> >>> I would like to read more opinions.
> >>
> >> Since the library is an unified interface to GPU device drivers
> >> I think it should be named as in the patch - gpudev.
> >>
> >> Mempool looks like an exception here - initially it was pure SW
> >> library, but not there are HW backends and corresponding device
> >> drivers.
> >>
> >> What I don't understand where is GPU specifics here?
> > 
> > That's an interesting question.
> > Let's ask first what is a GPU for DPDK?
> > I think it is like a sub-CPU with high parallel execution capabilities,
> > and it is controlled by the CPU.
> 
> I have no good ideas how to name it in accordance with
> above description to avoid "G" which for "Graphics" if
> understand correctly. However, may be it is not required.
> No strong opinion on the topic, but unbinding from
> "Graphics" would be nice.

That's a question I ask myself for months now.
I am not able to find a better name,
and I start thinking that "GPU" is famous enough in high-load computing
to convey the idea of what we can expect.





Re: [dpdk-dev] [PATCH] raw/ioat: fix missing device name in idxd bus scan

2021-06-04 Thread Pai G, Sunil


> -Original Message-
> From: Richardson, Bruce 
> Sent: Thursday, May 27, 2021 7:58 PM
> To: Laatz, Kevin 
> Cc: dev@dpdk.org; sta...@dpdk.org; Pai G, Sunil 
> Subject: Re: [PATCH] raw/ioat: fix missing device name in idxd bus scan
> 
> On Thu, May 27, 2021 at 02:36:09PM +0100, Kevin Laatz wrote:
> > The device name is not being initialized during the idxd bus scan
> > which will cause segmentation faults when an appliation tries to
> > access this information.
> >
> > This patch adds the required initialization of the device name so that
> > it can be read without issues.
> >
> > Fixes: b7aaf417f936 ("raw/ioat: add bus driver for device scanning
> > automatically")
> >
> > Reported-by: Sunil Pai G 
> > Signed-off-by: Kevin Laatz 
> > ---
> Acked-by: Bruce Richardson 

Tested-by: Sunil Pai G 


[dpdk-dev] [PATCH v2 0/3] Increase test compatibility with PA IOVA

2021-06-04 Thread Stanislaw Kardach
While working on a RISC-V port, using a HiFive Unmatched (FU740) which
does not have IOMMU (hence only RTE_IOVA_PA is available), I've noticed
that some of the EAL tests are failing because of a totally different
reason than the test itself.
Namely the --no-huge flag and --iova-mode=pa can't be used together and
EAL init fails warning about a lack of access to physical addresses.
This patchset tries to cleanup the --no-huge usage so that it doesn't
hide the real state of tests when RTE_IOVA_PA is used (i.e. on platforms
without IOMMU).

I'm proposing to skip the no-huge test for RTE_IOVA_PA environments as
it is not supported by design as well as removing no-huge usage on Linux
as it seems that it is used (along with --no-shconf) to increase the
compatibility with FreeBSD.

Please let me know if I'm missing a bigger picture with the --no-huge
and --no-shconf usage on non-FreeBSD platforms.

I'm not adding sta...@dpdk.org on purpose as this does not affect any
current platform I'm aware of (at least in a production scenario).

---

V2:
- Fix checkpatch errors
- Add affected platform in the cover letter.

Stanislaw Kardach (3):
  test: disable no-huge test with PA IOVA
  test: disable no-huge where it's not necessary
  test: fix the -n unit test description

 app/test/test_eal_flags.c | 63 ++-
 1 file changed, 42 insertions(+), 21 deletions(-)

-- 
2.27.0



[dpdk-dev] [PATCH v2 1/3] test: disable no-huge test with PA IOVA

2021-06-04 Thread Stanislaw Kardach
On Linux systems without IOMMU support available (be it lack of
supported IOMMU or lack of IOMMU support in kernel or explicit --no-huge
EAL parameter), the IOVA mapping will default to DMA with physical
addresses. This implicitly requires hugepage support to work (checked in
rte_eal_using_phys_addrs).
Therefore trying to run the eal_flags_no_huge_autotest in such scenario
is not a valid requirement. This issue was discovered on RISC-V arch.

To verify this even on x86 do (output from i5-10210U):

$ ./app/test/dpdk-test -m 18 --iova-mode=pa --no-huge
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: FATAL: Cannot use IOVA as 'PA' since physical addresses are not ...
EAL: Cannot use IOVA as 'PA' since physical addresses are not available

While doing:

$ sudo ./app/test/dpdk-test --iova-mode=pa
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 1048576 kB hugepages reported
EAL: Probing VFIO support...
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
APP: HPET is not enabled, using TSC as default timer
RTE>>

This commit finishes the above test early with SKIP status to signify
that no-huge support is simply not available.

Signed-off-by: Stanislaw Kardach 
---
 app/test/test_eal_flags.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 932fbe3d08..462dc63842 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -756,6 +756,15 @@ test_no_huge_flag(void)
 #else
const char * prefix = "--file-prefix=nohuge";
 #endif
+#ifdef RTE_EXEC_ENV_LINUX
+   /* EAL requires hugepages for RTE_IOVA_PA operation on linux.
+* The test application is run with RTE_IOVA_DC, so if at this point we
+* get RTE_IOVA_PA, it means that newly spawned process will also get
+* it.
+*/
+   if (rte_eal_iova_mode() == RTE_IOVA_PA)
+   return TEST_SKIPPED;
+#endif
 
/* With --no-huge */
const char *argv1[] = {prgname, prefix, no_huge};
-- 
2.27.0



[dpdk-dev] [PATCH v2 2/3] test: disable no-huge where it's not necessary

2021-06-04 Thread Stanislaw Kardach
In tests where no-shconf flag is used, no-huge is also passed due to
compatibility with FreeBSD system, as described in: b5d878e6d.
However on Linux systems with RTE_IOVA_PA (lack of or an incompatible
IOMMU) this causes issues since hugepages are required by EAL.
Therefore replace all occurrences of no_huge which don't actually test
the no-huge logic with a execution environment conditional
no_huge_compat to indicate that it is passed as a compatibility flag,
not as a requirement for a test itself.

Signed-off-by: Stanislaw Kardach 
Fixes: b5d878e6db56 ("test: fix EAL flags autotest on FreeBSD")
Cc: anatoly.bura...@intel.com
---
 app/test/test_eal_flags.c | 50 ---
 1 file changed, 31 insertions(+), 19 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 462dc63842..e2248a5d9a 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -29,6 +29,17 @@
 #define mp_flag "--proc-type=secondary"
 #define no_hpet "--no-hpet"
 #define no_huge "--no-huge"
+/* FreeBSD does not support running multiple primary processes, hence for tests
+ * requiring no-shconf, no-huge is also required.
+ * On Linux on the other hand no-huge is not needed so don't pass it as it
+ * would break cases when IOMMU is not able to provide IOVA translation
+ * (rte_eal_iova_mode() == RTE_IOVA_PA).
+ */
+#ifdef RTE_EXEC_ENV_LINUX
+#define no_huge_compat ""
+#else
+#define no_huge_compat no_huge
+#endif
 #define no_shconf "--no-shconf"
 #define allow "--allow"
 #define vdev "--vdev"
@@ -354,18 +365,18 @@ test_invalid_vdev_flag(void)
 #endif
 
/* Test with invalid vdev option */
-   const char *vdevinval[] = {prgname, prefix, no_huge,
-   vdev, "eth_dummy"};
+   const char * const vdevinval[] = {prgname, prefix, no_huge_compat,
+ vdev, "eth_dummy"};
 
/* Test with valid vdev option */
-   const char *vdevval1[] = {prgname, prefix, no_huge,
-   vdev, "net_ring0"};
+   const char * const vdevval1[] = {prgname, prefix, no_huge_compat,
+vdev, "net_ring0"};
 
-   const char *vdevval2[] = {prgname, prefix, no_huge,
-   vdev, "net_ring0,args=test"};
+   const char * const vdevval2[] = {prgname, prefix, no_huge_compat,
+vdev, "net_ring0,args=test"};
 
-   const char *vdevval3[] = {prgname, prefix, no_huge,
-   vdev, "net_ring0,nodeaction=r1:0:CREATE"};
+   const char * const vdevval3[] = {prgname, prefix, no_huge_compat,
+   vdev, "net_ring0,nodeaction=r1:0:CREATE"};
 
if (launch_proc(vdevinval) == 0) {
printf("Error - process did run ok with invalid "
@@ -674,19 +685,20 @@ test_invalid_n_flag(void)
 #endif
 
/* -n flag but no value */
-   const char *argv1[] = { prgname, prefix, no_huge, no_shconf,
-   "-n"};
+   const char * const argv1[] = { prgname, prefix, no_huge_compat,
+  no_shconf, "-n"};
/* bad numeric value */
-   const char *argv2[] = { prgname, prefix, no_huge, no_shconf,
-   "-n", "e" };
+   const char * const argv2[] = { prgname, prefix, no_huge_compat,
+  no_shconf, "-n", "e" };
/* zero is invalid */
-   const char *argv3[] = { prgname, prefix, no_huge, no_shconf,
-   "-n", "0" };
+   const char * const argv3[] = { prgname, prefix, no_huge_compat,
+  no_shconf, "-n", "0" };
/* sanity test - check with good value */
-   const char *argv4[] = { prgname, prefix, no_huge, no_shconf,
-   "-n", "2" };
+   const char * const argv4[] = { prgname, prefix, no_huge_compat,
+  no_shconf, "-n", "2" };
/* sanity test - check with no -n flag */
-   const char *argv5[] = { prgname, prefix, no_huge, no_shconf};
+   const char * const argv5[] = { prgname, prefix, no_huge_compat,
+  no_shconf};
 
if (launch_proc(argv1) == 0
|| launch_proc(argv2) == 0
@@ -878,7 +890,7 @@ test_misc_flags(void)
const char *argv5[] = {prgname, prefix, mp_flag, "--syslog", "error"};
/* With no-sh-conf, also use no-huge to ensure this test runs on BSD */
const char *argv6[] = {prgname, "-m", DEFAULT_MEM_SIZE,
-   no_shconf, nosh_prefix, no_huge};
+   no_shconf, nosh_prefix, no_huge_compat};
 
/* With --huge-dir */
const char *argv7[] = {prgname, "-m", DEFAULT_MEM_SIZE,
@@ -920,7 +932,7 @@ test_misc_flags(void)
 
/* With process type as auto-detect with no-shconf */
const char * const argv17[] = {prgname, "--proc-type=auto",
-   no_shconf, nosh

[dpdk-dev] [PATCH v2 3/3] test: fix the -n unit test description

2021-06-04 Thread Stanislaw Kardach
When -n argument became optional, the test logic was fixed (by
1e0b51fd4) but the comment indicating why --no-huge and --no-shconf are
used was not changed.
Today those flags are used for compatibility with FreeBSD (see
b5d878e6d), so change the comment to reflect that.

Signed-off-by: Stanislaw Kardach 

Fixes: b5d878e6db56 ("test: fix EAL flags autotest on FreeBSD")
Cc: anatoly.bura...@intel.com
Fixes: 1e0b51fd4b75 ("app/test: fix unit test for option -n")
Cc: pablo.de.lara.gua...@intel.com
---
 app/test/test_eal_flags.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index e2248a5d9a..b1ab87cf8d 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -666,8 +666,8 @@ test_main_lcore_flag(void)
 /*
  * Test that the app doesn't run with invalid -n flag option.
  * Final test ensures it does run with valid options as sanity check
- * Since -n is not compulsory for MP, we instead use --no-huge and --no-shconf
- * flags.
+ * For compatibility with BSD use --no-huge and --no-shconf flags as we need to
+ * run a primary process.
  */
 static int
 test_invalid_n_flag(void)
-- 
2.27.0



[dpdk-dev] [PATCH v2 01/20] net/sfc: introduce ethdev Rx queue ID

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

Make software index of an Rx queue and ethdev index separate.
When an ethdev RxQ is accessed in ethdev callbacks, an explicit ethdev
queue index is used.

This is a preparation to introducing non-ethdev Rx queues.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/net/sfc/sfc.h|   2 +
 drivers/net/sfc/sfc_dp.h |   4 +
 drivers/net/sfc/sfc_ethdev.c |  69 --
 drivers/net/sfc/sfc_ev.c |   2 +-
 drivers/net/sfc/sfc_ev.h |  22 -
 drivers/net/sfc/sfc_flow.c   |  22 +++--
 drivers/net/sfc/sfc_rx.c | 179 +--
 drivers/net/sfc/sfc_rx.h |  10 +-
 8 files changed, 215 insertions(+), 95 deletions(-)

diff --git a/drivers/net/sfc/sfc.h b/drivers/net/sfc/sfc.h
index b48a818adb..ebe705020d 100644
--- a/drivers/net/sfc/sfc.h
+++ b/drivers/net/sfc/sfc.h
@@ -29,6 +29,7 @@
 #include "sfc_filter.h"
 #include "sfc_sriov.h"
 #include "sfc_mae.h"
+#include "sfc_dp.h"
 
 #ifdef __cplusplus
 extern "C" {
@@ -168,6 +169,7 @@ struct sfc_rss {
 struct sfc_adapter_shared {
unsigned intrxq_count;
struct sfc_rxq_info *rxq_info;
+   unsigned intethdev_rxq_count;
 
unsigned inttxq_count;
struct sfc_txq_info *txq_info;
diff --git a/drivers/net/sfc/sfc_dp.h b/drivers/net/sfc/sfc_dp.h
index 4bed137806..76065483d4 100644
--- a/drivers/net/sfc/sfc_dp.h
+++ b/drivers/net/sfc/sfc_dp.h
@@ -96,6 +96,10 @@ struct sfc_dp {
 /** List of datapath variants */
 TAILQ_HEAD(sfc_dp_list, sfc_dp);
 
+typedef unsigned int sfc_sw_index_t;
+typedef int32_tsfc_ethdev_qid_t;
+#define SFC_ETHDEV_QID_INVALID ((sfc_ethdev_qid_t)(-1))
+
 /* Check if available HW/FW capabilities are sufficient for the datapath */
 static inline bool
 sfc_dp_match_hw_fw_caps(const struct sfc_dp *dp, unsigned int avail_caps)
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index c50ecea0b9..2651c41288 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -463,26 +463,31 @@ sfc_dev_allmulti_disable(struct rte_eth_dev *dev)
 }
 
 static int
-sfc_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+sfc_rx_queue_setup(struct rte_eth_dev *dev, uint16_t ethdev_qid,
   uint16_t nb_rx_desc, unsigned int socket_id,
   const struct rte_eth_rxconf *rx_conf,
   struct rte_mempool *mb_pool)
 {
struct sfc_adapter_shared *sas = sfc_adapter_shared_by_eth_dev(dev);
struct sfc_adapter *sa = sfc_adapter_by_eth_dev(dev);
+   sfc_ethdev_qid_t sfc_ethdev_qid = ethdev_qid;
+   struct sfc_rxq_info *rxq_info;
+   sfc_sw_index_t sw_index;
int rc;
 
sfc_log_init(sa, "RxQ=%u nb_rx_desc=%u socket_id=%u",
-rx_queue_id, nb_rx_desc, socket_id);
+ethdev_qid, nb_rx_desc, socket_id);
 
sfc_adapter_lock(sa);
 
-   rc = sfc_rx_qinit(sa, rx_queue_id, nb_rx_desc, socket_id,
+   sw_index = sfc_rxq_sw_index_by_ethdev_rx_qid(sas, sfc_ethdev_qid);
+   rc = sfc_rx_qinit(sa, sw_index, nb_rx_desc, socket_id,
  rx_conf, mb_pool);
if (rc != 0)
goto fail_rx_qinit;
 
-   dev->data->rx_queues[rx_queue_id] = sas->rxq_info[rx_queue_id].dp;
+   rxq_info = sfc_rxq_info_by_ethdev_qid(sas, sfc_ethdev_qid);
+   dev->data->rx_queues[ethdev_qid] = rxq_info->dp;
 
sfc_adapter_unlock(sa);
 
@@ -500,7 +505,7 @@ sfc_rx_queue_release(void *queue)
struct sfc_dp_rxq *dp_rxq = queue;
struct sfc_rxq *rxq;
struct sfc_adapter *sa;
-   unsigned int sw_index;
+   sfc_sw_index_t sw_index;
 
if (dp_rxq == NULL)
return;
@@ -1182,15 +1187,14 @@ sfc_set_mc_addr_list(struct rte_eth_dev *dev,
  * use any process-local pointers from the adapter data.
  */
 static void
-sfc_rx_queue_info_get(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+sfc_rx_queue_info_get(struct rte_eth_dev *dev, uint16_t ethdev_qid,
  struct rte_eth_rxq_info *qinfo)
 {
struct sfc_adapter_shared *sas = sfc_adapter_shared_by_eth_dev(dev);
+   sfc_ethdev_qid_t sfc_ethdev_qid = ethdev_qid;
struct sfc_rxq_info *rxq_info;
 
-   SFC_ASSERT(rx_queue_id < sas->rxq_count);
-
-   rxq_info = &sas->rxq_info[rx_queue_id];
+   rxq_info = sfc_rxq_info_by_ethdev_qid(sas, sfc_ethdev_qid);
 
qinfo->mp = rxq_info->refill_mb_pool;
qinfo->conf.rx_free_thresh = rxq_info->refill_threshold;
@@ -1232,14 +1236,14 @@ sfc_tx_queue_info_get(struct rte_eth_dev *dev, uint16_t 
tx_queue_id,
  * use any process-local pointers from the adapter data.
  */
 static uint32_t
-sfc_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
+sfc_rx_queue_count(struct rte_eth_dev *dev, uint16_t ethdev_qid)
 {
const struct sfc_adapter

[dpdk-dev] [PATCH v2 00/20] net/sfc: support flow API COUNT action

2021-06-04 Thread Andrew Rybchenko
Update base driver and support COUNT action in transfer flow rules.

v2:
 - add release notes
 - add missing documentaion
 - fix spelling
 - handle query in stopped gracefully

Andrew Rybchenko (6):
  net/sfc: do not enable interrupts on internal Rx queues
  common/sfc_efx/base: separate target EvQ and IRQ config
  common/sfc_efx/base: support custom EvQ to IRQ mapping
  net/sfc: explicitly control IRQ used for Rx queues
  net/sfc: add NUMA-aware registry of service logical cores
  common/sfc_efx/base: add packetiser packet format definition

Igor Romanov (14):
  net/sfc: introduce ethdev Rx queue ID
  net/sfc: introduce ethdev Tx queue ID
  common/sfc_efx/base: add ingress m-port RxQ flag
  common/sfc_efx/base: add user mark RxQ flag
  net/sfc: add abstractions for the management EVQ identity
  net/sfc: add support for initialising different RxQ types
  net/sfc: reserve RxQ for counters
  common/sfc_efx/base: add counter creation MCDI wrappers
  common/sfc_efx/base: add counter stream MCDI wrappers
  common/sfc_efx/base: support counter in action set
  net/sfc: add Rx datapath method to get pushed buffers count
  common/sfc_efx/base: add max MAE counters to limits
  net/sfc: support flow action COUNT in transfer rules
  net/sfc: support flow API query for count actions

 doc/guides/nics/sfc_efx.rst   |   2 +
 doc/guides/rel_notes/release_21_08.rst|   6 +
 drivers/common/sfc_efx/base/ef10_ev.c |  14 +-
 drivers/common/sfc_efx/base/ef10_impl.h   |   1 +
 drivers/common/sfc_efx/base/ef10_rx.c |  57 +-
 drivers/common/sfc_efx/base/efx.h | 113 +++
 drivers/common/sfc_efx/base/efx_ev.c  |  39 +-
 drivers/common/sfc_efx/base/efx_impl.h|   8 +-
 drivers/common/sfc_efx/base/efx_mae.c | 430 -
 drivers/common/sfc_efx/base/efx_mcdi.c|   7 +-
 drivers/common/sfc_efx/base/efx_mcdi.h|   7 +
 .../base/efx_regs_counters_pkt_format.h   |  87 ++
 drivers/common/sfc_efx/base/efx_rx.c  |  14 +-
 drivers/common/sfc_efx/base/rhead_ev.c|  14 +-
 drivers/common/sfc_efx/base/rhead_impl.h  |   1 +
 drivers/common/sfc_efx/base/rhead_rx.c|   6 +
 drivers/common/sfc_efx/version.map|   9 +
 drivers/net/sfc/meson.build   |  12 +
 drivers/net/sfc/sfc.c |  68 +-
 drivers/net/sfc/sfc.h |  22 +
 drivers/net/sfc/sfc_dp.h  |   6 +
 drivers/net/sfc/sfc_dp_rx.h   |   4 +
 drivers/net/sfc/sfc_ef100_rx.c|  15 +
 drivers/net/sfc/sfc_ethdev.c  | 115 ++-
 drivers/net/sfc/sfc_ev.c  |  36 +-
 drivers/net/sfc/sfc_ev.h  | 107 ++-
 drivers/net/sfc/sfc_flow.c|  77 +-
 drivers/net/sfc/sfc_flow.h|   6 +
 drivers/net/sfc/sfc_mae.c | 296 ++-
 drivers/net/sfc/sfc_mae.h |  61 ++
 drivers/net/sfc/sfc_mae_counter.c | 827 ++
 drivers/net/sfc/sfc_mae_counter.h |  58 ++
 drivers/net/sfc/sfc_rx.c  | 231 +++--
 drivers/net/sfc/sfc_rx.h  |  15 +-
 drivers/net/sfc/sfc_service.c |  99 +++
 drivers/net/sfc/sfc_service.h |  20 +
 drivers/net/sfc/sfc_stats.h   |  80 ++
 drivers/net/sfc/sfc_tweak.h   |   9 +
 drivers/net/sfc/sfc_tx.c  | 164 ++--
 drivers/net/sfc/sfc_tx.h  |  11 +-
 40 files changed, 2904 insertions(+), 250 deletions(-)
 create mode 100644 drivers/common/sfc_efx/base/efx_regs_counters_pkt_format.h
 create mode 100644 drivers/net/sfc/sfc_mae_counter.c
 create mode 100644 drivers/net/sfc/sfc_mae_counter.h
 create mode 100644 drivers/net/sfc/sfc_service.c
 create mode 100644 drivers/net/sfc/sfc_service.h
 create mode 100644 drivers/net/sfc/sfc_stats.h

-- 
2.30.2



[dpdk-dev] [PATCH v2 02/20] net/sfc: do not enable interrupts on internal Rx queues

2021-06-04 Thread Andrew Rybchenko
rxq_intr flag requests support for interrupt mode for ethdev Rx queues.
There is no internal Rx queues yet.

Signed-off-by: Andrew Rybchenko 
---
 drivers/net/sfc/sfc_ev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/sfc/sfc_ev.c b/drivers/net/sfc/sfc_ev.c
index 2262994112..9a8149f052 100644
--- a/drivers/net/sfc/sfc_ev.c
+++ b/drivers/net/sfc/sfc_ev.c
@@ -663,7 +663,9 @@ sfc_ev_qstart(struct sfc_evq *evq, unsigned int hw_index)
 efx_evq_size(sa->nic, evq->entries, evq_flags));
 
if ((sa->intr.lsc_intr && hw_index == sa->mgmt_evq_index) ||
-   (sa->intr.rxq_intr && evq->dp_rxq != NULL))
+   (sa->intr.rxq_intr && evq->dp_rxq != NULL &&
+sfc_ethdev_rx_qid_by_rxq_sw_index(sfc_sa2shared(sa),
+   evq->dp_rxq->dpq.queue_id) != SFC_ETHDEV_QID_INVALID))
evq_flags |= EFX_EVQ_FLAGS_NOTIFY_INTERRUPT;
else
evq_flags |= EFX_EVQ_FLAGS_NOTIFY_DISABLED;
-- 
2.30.2



[dpdk-dev] [PATCH v2 03/20] common/sfc_efx/base: separate target EvQ and IRQ config

2021-06-04 Thread Andrew Rybchenko
Target EvQ and IRQ number are specified in the same location
in MCDI request. The value is treated as IRQ number if the
event queue is interrupting (corresponding flag is set) and
as target event queue otherwise.

However it is better to separate it on helper API level to
make it more clear.

Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/common/sfc_efx/base/ef10_ev.c  | 12 +++-
 drivers/common/sfc_efx/base/efx_impl.h |  1 +
 drivers/common/sfc_efx/base/efx_mcdi.c |  7 ++-
 drivers/common/sfc_efx/base/rhead_ev.c | 12 +++-
 4 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/common/sfc_efx/base/ef10_ev.c 
b/drivers/common/sfc_efx/base/ef10_ev.c
index ea59beecc4..c0cbc427b9 100644
--- a/drivers/common/sfc_efx/base/ef10_ev.c
+++ b/drivers/common/sfc_efx/base/ef10_ev.c
@@ -121,7 +121,8 @@ ef10_ev_qcreate(
__inefx_evq_t *eep)
 {
efx_nic_cfg_t *encp = &(enp->en_nic_cfg);
-   uint32_t irq;
+   uint32_t irq = 0;
+   uint32_t target_evq = 0;
efx_rc_t rc;
boolean_t low_latency;
 
@@ -159,11 +160,12 @@ ef10_ev_qcreate(
EFX_EVQ_FLAGS_NOTIFY_INTERRUPT) {
irq = index;
} else if (index == EFX_EF10_ALWAYS_INTERRUPTING_EVQ_INDEX) {
-   irq = index;
+   /* Use the first interrupt for always interrupting EvQ */
+   irq = 0;
flags = (flags & ~EFX_EVQ_FLAGS_NOTIFY_MASK) |
EFX_EVQ_FLAGS_NOTIFY_INTERRUPT;
} else {
-   irq = EFX_EF10_ALWAYS_INTERRUPTING_EVQ_INDEX;
+   target_evq = EFX_EF10_ALWAYS_INTERRUPTING_EVQ_INDEX;
}
 
/*
@@ -187,8 +189,8 @@ ef10_ev_qcreate(
 * decision and low_latency hint is ignored.
 */
low_latency = encp->enc_datapath_cap_evb ? 0 : 1;
-   rc = efx_mcdi_init_evq(enp, index, esmp, ndescs, irq, us, flags,
-   low_latency);
+   rc = efx_mcdi_init_evq(enp, index, esmp, ndescs, irq, target_evq, us,
+   flags, low_latency);
if (rc != 0)
goto fail2;
 
diff --git a/drivers/common/sfc_efx/base/efx_impl.h 
b/drivers/common/sfc_efx/base/efx_impl.h
index 8b63cfb37d..4fff9e1842 100644
--- a/drivers/common/sfc_efx/base/efx_impl.h
+++ b/drivers/common/sfc_efx/base/efx_impl.h
@@ -1535,6 +1535,7 @@ efx_mcdi_init_evq(
__inefsys_mem_t *esmp,
__insize_t nevs,
__inuint32_t irq,
+   __inuint32_t target_evq,
__inuint32_t us,
__inuint32_t flags,
__inboolean_t low_latency);
diff --git a/drivers/common/sfc_efx/base/efx_mcdi.c 
b/drivers/common/sfc_efx/base/efx_mcdi.c
index f226ffd923..b68fc0503d 100644
--- a/drivers/common/sfc_efx/base/efx_mcdi.c
+++ b/drivers/common/sfc_efx/base/efx_mcdi.c
@@ -2568,6 +2568,7 @@ efx_mcdi_init_evq(
__inefsys_mem_t *esmp,
__insize_t nevs,
__inuint32_t irq,
+   __inuint32_t target_evq,
__inuint32_t us,
__inuint32_t flags,
__inboolean_t low_latency)
@@ -2602,11 +2603,15 @@ efx_mcdi_init_evq(
 
MCDI_IN_SET_DWORD(req, INIT_EVQ_V2_IN_SIZE, nevs);
MCDI_IN_SET_DWORD(req, INIT_EVQ_V2_IN_INSTANCE, instance);
-   MCDI_IN_SET_DWORD(req, INIT_EVQ_V2_IN_IRQ_NUM, irq);
 
interrupting = ((flags & EFX_EVQ_FLAGS_NOTIFY_MASK) ==
EFX_EVQ_FLAGS_NOTIFY_INTERRUPT);
 
+   if (interrupting)
+   MCDI_IN_SET_DWORD(req, INIT_EVQ_V2_IN_IRQ_NUM, irq);
+   else
+   MCDI_IN_SET_DWORD(req, INIT_EVQ_V2_IN_TARGET_EVQ, target_evq);
+
if (encp->enc_init_evq_v2_supported) {
/*
 * On Medford the low latency license is required to enable RX
diff --git a/drivers/common/sfc_efx/base/rhead_ev.c 
b/drivers/common/sfc_efx/base/rhead_ev.c
index 2099581fd7..533cd9e34a 100644
--- a/drivers/common/sfc_efx/base/rhead_ev.c
+++ b/drivers/common/sfc_efx/base/rhead_ev.c
@@ -106,7 +106,8 @@ rhead_ev_qcreate(
 {
const efx_nic_cfg_t *encp = efx_nic_cfg_get(enp);
size_t desc_size;
-   uint32_t irq;
+   uint32_t irq = 0;
+   uint32_t target_evq = 0;
efx_rc_t rc;
 
_NOTE(ARGUNUSED(id))/* buftbl id managed by MC */
@@ -142,19 +143,20 @@ rhead_ev_qcreate(
EFX_EVQ_FLAGS_NOTIFY_INTERRUPT) {
irq = index;
} else if (index == EFX_RHEAD_ALWAYS_INTERRUPTING_EVQ_INDEX) {
-   irq = index;
+   /* Use the first interrupt for always interrupting EvQ */
+   irq = 0;
flags = (flags & ~EFX_EVQ_FLAGS_NOTIFY_MASK) |
EFX_EVQ_FLAGS_NOTIFY_INTERRUPT;
} else {
-   irq = EFX_RHEAD_ALWAYS_INTERRUPTING_EVQ_INDEX;
+   target_evq = EFX_RHEAD_ALWAYS_INTERRUPTING_EVQ_INDEX;
   

[dpdk-dev] [PATCH v2 04/20] common/sfc_efx/base: support custom EvQ to IRQ mapping

2021-06-04 Thread Andrew Rybchenko
Custom mapping is actually supported for EF10 and EF100 families only.

A driver (e.g. DPDK PMD) may require to customize mapping of EvQ
to interrupts if, for example, extra EvQ are used for house-keeping
in polling or wake up (via another EvQ) mode.

Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/common/sfc_efx/base/ef10_ev.c|  4 +--
 drivers/common/sfc_efx/base/ef10_impl.h  |  1 +
 drivers/common/sfc_efx/base/efx.h| 13 
 drivers/common/sfc_efx/base/efx_ev.c | 39 
 drivers/common/sfc_efx/base/efx_impl.h   |  3 +-
 drivers/common/sfc_efx/base/rhead_ev.c   |  4 +--
 drivers/common/sfc_efx/base/rhead_impl.h |  1 +
 drivers/common/sfc_efx/version.map   |  1 +
 8 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/drivers/common/sfc_efx/base/ef10_ev.c 
b/drivers/common/sfc_efx/base/ef10_ev.c
index c0cbc427b9..ba078940b6 100644
--- a/drivers/common/sfc_efx/base/ef10_ev.c
+++ b/drivers/common/sfc_efx/base/ef10_ev.c
@@ -118,10 +118,10 @@ ef10_ev_qcreate(
__inuint32_t id,
__inuint32_t us,
__inuint32_t flags,
+   __inuint32_t irq,
__inefx_evq_t *eep)
 {
efx_nic_cfg_t *encp = &(enp->en_nic_cfg);
-   uint32_t irq = 0;
uint32_t target_evq = 0;
efx_rc_t rc;
boolean_t low_latency;
@@ -158,7 +158,7 @@ ef10_ev_qcreate(
/* INIT_EVQ expects function-relative vector number */
if ((flags & EFX_EVQ_FLAGS_NOTIFY_MASK) ==
EFX_EVQ_FLAGS_NOTIFY_INTERRUPT) {
-   irq = index;
+   /* IRQ number is specified by caller */
} else if (index == EFX_EF10_ALWAYS_INTERRUPTING_EVQ_INDEX) {
/* Use the first interrupt for always interrupting EvQ */
irq = 0;
diff --git a/drivers/common/sfc_efx/base/ef10_impl.h 
b/drivers/common/sfc_efx/base/ef10_impl.h
index 40210fbd91..7c8d51b7a5 100644
--- a/drivers/common/sfc_efx/base/ef10_impl.h
+++ b/drivers/common/sfc_efx/base/ef10_impl.h
@@ -111,6 +111,7 @@ ef10_ev_qcreate(
__inuint32_t id,
__inuint32_t us,
__inuint32_t flags,
+   __inuint32_t irq,
__inefx_evq_t *eep);
 
 LIBEFX_INTERNAL
diff --git a/drivers/common/sfc_efx/base/efx.h 
b/drivers/common/sfc_efx/base/efx.h
index 8e13075b07..6a99099ad2 100644
--- a/drivers/common/sfc_efx/base/efx.h
+++ b/drivers/common/sfc_efx/base/efx.h
@@ -2333,6 +2333,19 @@ efx_ev_qcreate(
__inuint32_t flags,
__deref_out efx_evq_t **eepp);
 
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_ev_qcreate_irq(
+   __inefx_nic_t *enp,
+   __inunsigned int index,
+   __inefsys_mem_t *esmp,
+   __insize_t ndescs,
+   __inuint32_t id,
+   __inuint32_t us,
+   __inuint32_t flags,
+   __inuint32_t irq,
+   __deref_out efx_evq_t **eepp);
+
 LIBEFX_API
 extern void
 efx_ev_qpost(
diff --git a/drivers/common/sfc_efx/base/efx_ev.c 
b/drivers/common/sfc_efx/base/efx_ev.c
index 19bdea03fd..4808f8ddfc 100644
--- a/drivers/common/sfc_efx/base/efx_ev.c
+++ b/drivers/common/sfc_efx/base/efx_ev.c
@@ -35,6 +35,7 @@ siena_ev_qcreate(
__inuint32_t id,
__inuint32_t us,
__inuint32_t flags,
+   __inuint32_t irq,
__inefx_evq_t *eep);
 
 static void
@@ -253,7 +254,7 @@ efx_ev_fini(
 
 
__checkReturn   efx_rc_t
-efx_ev_qcreate(
+efx_ev_qcreate_irq(
__inefx_nic_t *enp,
__inunsigned int index,
__inefsys_mem_t *esmp,
@@ -261,6 +262,7 @@ efx_ev_qcreate(
__inuint32_t id,
__inuint32_t us,
__inuint32_t flags,
+   __inuint32_t irq,
__deref_out efx_evq_t **eepp)
 {
const efx_ev_ops_t *eevop = enp->en_eevop;
@@ -347,7 +349,7 @@ efx_ev_qcreate(
*eepp = eep;
 
if ((rc = eevop->eevo_qcreate(enp, index, esmp, ndescs, id, us, flags,
-   eep)) != 0)
+   irq, eep)) != 0)
goto fail9;
 
return (0);
@@ -377,6 +379,23 @@ efx_ev_qcreate(
return (rc);
 }
 
+   __checkReturn   efx_rc_t
+efx_ev_qcreate(
+   __inefx_nic_t *enp,
+   __inunsigned int index,
+   __inefsys_mem_t *esmp,
+   __insize_t ndescs,
+   __inuint32_t id,
+   __inuint32_t us,
+   __inuint32_t flags,
+   __deref_out efx_evq_t **eepp)
+{
+   uint32_t irq = index;
+
+   return (efx_ev_qcreate_irq(enp, index, esmp, ndescs, id, us, flags,
+   irq, eepp));
+}
+
void
 efx_ev_qdestroy(
__inefx_

[dpdk-dev] [PATCH v2 06/20] net/sfc: introduce ethdev Tx queue ID

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

Make software index of a Tx queue and ethdev index separate.
When an ethdev TxQ is accessed in ethdev callbacks, an explicit ethdev
queue index is used.

This is a preparation to introducing non-ethdev Tx queues.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/net/sfc/sfc.h|   1 +
 drivers/net/sfc/sfc_ethdev.c |  46 ++
 drivers/net/sfc/sfc_ev.c |   2 +-
 drivers/net/sfc/sfc_ev.h |  21 -
 drivers/net/sfc/sfc_tx.c | 164 ---
 drivers/net/sfc/sfc_tx.h |  11 +--
 6 files changed, 171 insertions(+), 74 deletions(-)

diff --git a/drivers/net/sfc/sfc.h b/drivers/net/sfc/sfc.h
index ebe705020d..00fc26cf0e 100644
--- a/drivers/net/sfc/sfc.h
+++ b/drivers/net/sfc/sfc.h
@@ -173,6 +173,7 @@ struct sfc_adapter_shared {
 
unsigned inttxq_count;
struct sfc_txq_info *txq_info;
+   unsigned intethdev_txq_count;
 
struct sfc_rss  rss;
 
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index 2651c41288..88896db1f8 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -524,24 +524,28 @@ sfc_rx_queue_release(void *queue)
 }
 
 static int
-sfc_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+sfc_tx_queue_setup(struct rte_eth_dev *dev, uint16_t ethdev_qid,
   uint16_t nb_tx_desc, unsigned int socket_id,
   const struct rte_eth_txconf *tx_conf)
 {
struct sfc_adapter_shared *sas = sfc_adapter_shared_by_eth_dev(dev);
struct sfc_adapter *sa = sfc_adapter_by_eth_dev(dev);
+   struct sfc_txq_info *txq_info;
+   sfc_sw_index_t sw_index;
int rc;
 
sfc_log_init(sa, "TxQ = %u, nb_tx_desc = %u, socket_id = %u",
-tx_queue_id, nb_tx_desc, socket_id);
+ethdev_qid, nb_tx_desc, socket_id);
 
sfc_adapter_lock(sa);
 
-   rc = sfc_tx_qinit(sa, tx_queue_id, nb_tx_desc, socket_id, tx_conf);
+   sw_index = sfc_txq_sw_index_by_ethdev_tx_qid(sas, ethdev_qid);
+   rc = sfc_tx_qinit(sa, sw_index, nb_tx_desc, socket_id, tx_conf);
if (rc != 0)
goto fail_tx_qinit;
 
-   dev->data->tx_queues[tx_queue_id] = sas->txq_info[tx_queue_id].dp;
+   txq_info = sfc_txq_info_by_ethdev_qid(sas, ethdev_qid);
+   dev->data->tx_queues[ethdev_qid] = txq_info->dp;
 
sfc_adapter_unlock(sa);
return 0;
@@ -557,7 +561,7 @@ sfc_tx_queue_release(void *queue)
 {
struct sfc_dp_txq *dp_txq = queue;
struct sfc_txq *txq;
-   unsigned int sw_index;
+   sfc_sw_index_t sw_index;
struct sfc_adapter *sa;
 
if (dp_txq == NULL)
@@ -1213,15 +1217,15 @@ sfc_rx_queue_info_get(struct rte_eth_dev *dev, uint16_t 
ethdev_qid,
  * use any process-local pointers from the adapter data.
  */
 static void
-sfc_tx_queue_info_get(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+sfc_tx_queue_info_get(struct rte_eth_dev *dev, uint16_t ethdev_qid,
  struct rte_eth_txq_info *qinfo)
 {
struct sfc_adapter_shared *sas = sfc_adapter_shared_by_eth_dev(dev);
struct sfc_txq_info *txq_info;
 
-   SFC_ASSERT(tx_queue_id < sas->txq_count);
+   SFC_ASSERT(ethdev_qid < sas->ethdev_txq_count);
 
-   txq_info = &sas->txq_info[tx_queue_id];
+   txq_info = sfc_txq_info_by_ethdev_qid(sas, ethdev_qid);
 
memset(qinfo, 0, sizeof(*qinfo));
 
@@ -1362,13 +1366,15 @@ sfc_rx_queue_stop(struct rte_eth_dev *dev, uint16_t 
ethdev_qid)
 }
 
 static int
-sfc_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
+sfc_tx_queue_start(struct rte_eth_dev *dev, uint16_t ethdev_qid)
 {
struct sfc_adapter_shared *sas = sfc_adapter_shared_by_eth_dev(dev);
struct sfc_adapter *sa = sfc_adapter_by_eth_dev(dev);
+   struct sfc_txq_info *txq_info;
+   sfc_sw_index_t sw_index;
int rc;
 
-   sfc_log_init(sa, "TxQ = %u", tx_queue_id);
+   sfc_log_init(sa, "TxQ = %u", ethdev_qid);
 
sfc_adapter_lock(sa);
 
@@ -1376,14 +1382,16 @@ sfc_tx_queue_start(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
if (sa->state != SFC_ADAPTER_STARTED)
goto fail_not_started;
 
-   if (sas->txq_info[tx_queue_id].state != SFC_TXQ_INITIALIZED)
+   txq_info = sfc_txq_info_by_ethdev_qid(sas, ethdev_qid);
+   if (txq_info->state != SFC_TXQ_INITIALIZED)
goto fail_not_setup;
 
-   rc = sfc_tx_qstart(sa, tx_queue_id);
+   sw_index = sfc_txq_sw_index_by_ethdev_tx_qid(sas, ethdev_qid);
+   rc = sfc_tx_qstart(sa, sw_index);
if (rc != 0)
goto fail_tx_qstart;
 
-   sas->txq_info[tx_queue_id].deferred_started = B_TRUE;
+   txq_info->deferred_started = B_TRUE;
 
sfc_adapter_unlock(sa);
return 0;
@@ -1398,18 +1406,22 @@ sfc_tx_

[dpdk-dev] [PATCH v2 05/20] net/sfc: explicitly control IRQ used for Rx queues

2021-06-04 Thread Andrew Rybchenko
Interrupts support has assumptions on interrupt numbers used
for LSC and Rx queues. The first interrupt is used for LSC,
subsequent interrupts are used for Rx queues.

Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/sfc_ev.c | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/net/sfc/sfc_ev.c b/drivers/net/sfc/sfc_ev.c
index 9a8149f052..71f706e403 100644
--- a/drivers/net/sfc/sfc_ev.c
+++ b/drivers/net/sfc/sfc_ev.c
@@ -648,6 +648,7 @@ sfc_ev_qstart(struct sfc_evq *evq, unsigned int hw_index)
struct sfc_adapter *sa = evq->sa;
efsys_mem_t *esmp;
uint32_t evq_flags = sa->evq_flags;
+   uint32_t irq = 0;
unsigned int total_delay_us;
unsigned int delay_us;
int rc;
@@ -662,20 +663,35 @@ sfc_ev_qstart(struct sfc_evq *evq, unsigned int hw_index)
(void)memset((void *)esmp->esm_base, 0xff,
 efx_evq_size(sa->nic, evq->entries, evq_flags));
 
-   if ((sa->intr.lsc_intr && hw_index == sa->mgmt_evq_index) ||
-   (sa->intr.rxq_intr && evq->dp_rxq != NULL &&
-sfc_ethdev_rx_qid_by_rxq_sw_index(sfc_sa2shared(sa),
-   evq->dp_rxq->dpq.queue_id) != SFC_ETHDEV_QID_INVALID))
+   if (sa->intr.lsc_intr && hw_index == sa->mgmt_evq_index) {
evq_flags |= EFX_EVQ_FLAGS_NOTIFY_INTERRUPT;
-   else
+   irq = 0;
+   } else if (sa->intr.rxq_intr && evq->dp_rxq != NULL) {
+   sfc_ethdev_qid_t ethdev_qid;
+
+   ethdev_qid =
+   sfc_ethdev_rx_qid_by_rxq_sw_index(sfc_sa2shared(sa),
+   evq->dp_rxq->dpq.queue_id);
+   if (ethdev_qid != SFC_ETHDEV_QID_INVALID) {
+   evq_flags |= EFX_EVQ_FLAGS_NOTIFY_INTERRUPT;
+   /*
+* The first interrupt is used for management EvQ
+* (LSC etc). RxQ interrupts follow it.
+*/
+   irq = 1 + ethdev_qid;
+   } else {
+   evq_flags |= EFX_EVQ_FLAGS_NOTIFY_DISABLED;
+   }
+   } else {
evq_flags |= EFX_EVQ_FLAGS_NOTIFY_DISABLED;
+   }
 
evq->init_state = SFC_EVQ_STARTING;
 
/* Create the common code event queue */
-   rc = efx_ev_qcreate(sa->nic, hw_index, esmp, evq->entries,
-   0 /* unused on EF10 */, 0, evq_flags,
-   &evq->common);
+   rc = efx_ev_qcreate_irq(sa->nic, hw_index, esmp, evq->entries,
+   0 /* unused on EF10 */, 0, evq_flags,
+   irq, &evq->common);
if (rc != 0)
goto fail_ev_qcreate;
 
-- 
2.30.2



[dpdk-dev] [PATCH v2 08/20] common/sfc_efx/base: add user mark RxQ flag

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

Add a flag to request support for user mark field on an RxQ.
The field is required to retrieve generation count value from
counter RxQ.

Implement it only for Riverhead and EF10 ESSB since they support
the field in the Rx prefix.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/common/sfc_efx/base/ef10_rx.c  | 52 --
 drivers/common/sfc_efx/base/efx.h  |  4 ++
 drivers/common/sfc_efx/base/rhead_rx.c |  3 ++
 3 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/drivers/common/sfc_efx/base/ef10_rx.c 
b/drivers/common/sfc_efx/base/ef10_rx.c
index 0e140645a5..0c3f9413cf 100644
--- a/drivers/common/sfc_efx/base/ef10_rx.c
+++ b/drivers/common/sfc_efx/base/ef10_rx.c
@@ -926,6 +926,10 @@ ef10_rx_qcreate(
goto fail1;
}
erp->er_buf_size = type_data->ertd_default.ed_buf_size;
+   if (flags & EFX_RXQ_FLAG_USER_MARK) {
+   rc = ENOTSUP;
+   goto fail2;
+   }
/*
 * Ignore EFX_RXQ_FLAG_RSS_HASH since if RSS hash is calculated
 * it is always delivered from HW in the pseudo-header.
@@ -936,7 +940,7 @@ ef10_rx_qcreate(
erpl = &ef10_packed_stream_rx_prefix_layout;
if (type_data == NULL) {
rc = EINVAL;
-   goto fail2;
+   goto fail3;
}
switch (type_data->ertd_packed_stream.eps_buf_size) {
case EFX_RXQ_PACKED_STREAM_BUF_SIZE_1M:
@@ -956,13 +960,17 @@ ef10_rx_qcreate(
break;
default:
rc = ENOTSUP;
-   goto fail3;
+   goto fail4;
}
erp->er_buf_size = type_data->ertd_packed_stream.eps_buf_size;
/* Packed stream pseudo header does not have RSS hash value */
if (flags & EFX_RXQ_FLAG_RSS_HASH) {
rc = ENOTSUP;
-   goto fail4;
+   goto fail5;
+   }
+   if (flags & EFX_RXQ_FLAG_USER_MARK) {
+   rc = ENOTSUP;
+   goto fail6;
}
break;
 #endif /* EFSYS_OPT_RX_PACKED_STREAM */
@@ -971,7 +979,7 @@ ef10_rx_qcreate(
erpl = &ef10_essb_rx_prefix_layout;
if (type_data == NULL) {
rc = EINVAL;
-   goto fail5;
+   goto fail7;
}
params.es_bufs_per_desc =
type_data->ertd_es_super_buffer.eessb_bufs_per_desc;
@@ -989,7 +997,7 @@ ef10_rx_qcreate(
 #endif /* EFSYS_OPT_RX_ES_SUPER_BUFFER */
default:
rc = ENOTSUP;
-   goto fail6;
+   goto fail8;
}
 
 #if EFSYS_OPT_RX_PACKED_STREAM
@@ -997,13 +1005,13 @@ ef10_rx_qcreate(
/* Check if datapath firmware supports packed stream mode */
if (encp->enc_rx_packed_stream_supported == B_FALSE) {
rc = ENOTSUP;
-   goto fail7;
+   goto fail9;
}
/* Check if packed stream allows configurable buffer sizes */
if ((params.ps_buf_size != MC_CMD_INIT_RXQ_EXT_IN_PS_BUFF_1M) &&
(encp->enc_rx_var_packed_stream_supported == B_FALSE)) {
rc = ENOTSUP;
-   goto fail8;
+   goto fail10;
}
}
 #else /* EFSYS_OPT_RX_PACKED_STREAM */
@@ -1014,17 +1022,17 @@ ef10_rx_qcreate(
if (params.es_bufs_per_desc > 0) {
if (encp->enc_rx_es_super_buffer_supported == B_FALSE) {
rc = ENOTSUP;
-   goto fail9;
+   goto fail11;
}
if (!EFX_IS_P2ALIGNED(uint32_t, params.es_max_dma_len,
EFX_RX_ES_SUPER_BUFFER_BUF_ALIGNMENT)) {
rc = EINVAL;
-   goto fail10;
+   goto fail12;
}
if (!EFX_IS_P2ALIGNED(uint32_t, params.es_buf_stride,
EFX_RX_ES_SUPER_BUFFER_BUF_ALIGNMENT)) {
rc = EINVAL;
-   goto fail11;
+   goto fail13;
}
}
 #else /* EFSYS_OPT_RX_ES_SUPER_BUFFER */
@@ -1033,7 +1041,7 @@ ef10_rx_qcreate(
 
if (flags & EFX_RXQ_FLAG_INGRESS_MPORT) {
rc = ENOTSUP;
-   goto fail12;
+   goto fail14;
}
 
/* Scatter can only be disabled if the firmware supports doing so */
@@ -1049,7 +1057,7 @@ ef10_rx_qcreate(
 
if ((rc = efx_mcdi_init_r

[dpdk-dev] [PATCH v2 10/20] net/sfc: add support for initialising different RxQ types

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

Add extra EFX flags to RxQ info initialization API to support
choosing different RxQ types and make the API public to use
it in for counter queues.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/sfc_rx.c | 10 ++
 drivers/net/sfc/sfc_rx.h |  2 ++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/sfc/sfc_rx.c b/drivers/net/sfc/sfc_rx.c
index 597785ae02..c7a7bd66ef 100644
--- a/drivers/net/sfc/sfc_rx.c
+++ b/drivers/net/sfc/sfc_rx.c
@@ -1155,7 +1155,7 @@ sfc_rx_qinit(struct sfc_adapter *sa, sfc_sw_index_t 
sw_index,
else
rxq_info->type = EFX_RXQ_TYPE_DEFAULT;
 
-   rxq_info->type_flags =
+   rxq_info->type_flags |=
(offloads & DEV_RX_OFFLOAD_SCATTER) ?
EFX_RXQ_FLAG_SCATTER : EFX_RXQ_FLAG_NONE;
 
@@ -1594,8 +1594,9 @@ sfc_rx_stop(struct sfc_adapter *sa)
efx_rx_fini(sa->nic);
 }
 
-static int
-sfc_rx_qinit_info(struct sfc_adapter *sa, sfc_sw_index_t sw_index)
+int
+sfc_rx_qinit_info(struct sfc_adapter *sa, sfc_sw_index_t sw_index,
+ unsigned int extra_efx_type_flags)
 {
struct sfc_adapter_shared * const sas = sfc_sa2shared(sa);
struct sfc_rxq_info *rxq_info = &sas->rxq_info[sw_index];
@@ -1606,6 +1607,7 @@ sfc_rx_qinit_info(struct sfc_adapter *sa, sfc_sw_index_t 
sw_index)
SFC_ASSERT(rte_is_power_of_2(max_entries));
 
rxq_info->max_entries = max_entries;
+   rxq_info->type_flags = extra_efx_type_flags;
 
return 0;
 }
@@ -1770,7 +1772,7 @@ sfc_rx_configure(struct sfc_adapter *sa)
 
sw_index = sfc_rxq_sw_index_by_ethdev_rx_qid(sas,
sas->ethdev_rxq_count);
-   rc = sfc_rx_qinit_info(sa, sw_index);
+   rc = sfc_rx_qinit_info(sa, sw_index, 0);
if (rc != 0)
goto fail_rx_qinit_info;
 
diff --git a/drivers/net/sfc/sfc_rx.h b/drivers/net/sfc/sfc_rx.h
index 96c7dc415d..e5a6fde79b 100644
--- a/drivers/net/sfc/sfc_rx.h
+++ b/drivers/net/sfc/sfc_rx.h
@@ -129,6 +129,8 @@ void sfc_rx_close(struct sfc_adapter *sa);
 int sfc_rx_start(struct sfc_adapter *sa);
 void sfc_rx_stop(struct sfc_adapter *sa);
 
+int sfc_rx_qinit_info(struct sfc_adapter *sa, sfc_sw_index_t sw_index,
+ unsigned int extra_efx_type_flags);
 int sfc_rx_qinit(struct sfc_adapter *sa, unsigned int rx_queue_id,
 uint16_t nb_rx_desc, unsigned int socket_id,
 const struct rte_eth_rxconf *rx_conf,
-- 
2.30.2



[dpdk-dev] [PATCH v2 07/20] common/sfc_efx/base: add ingress m-port RxQ flag

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

Add a flag to request support for ingress m-port on an RxQ.
Implement it only for Riverhead, other families will return an error
if the flag is set.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/common/sfc_efx/base/ef10_rx.c  |  9 -
 drivers/common/sfc_efx/base/efx.h  |  5 +
 drivers/common/sfc_efx/base/efx_rx.c   | 14 +-
 drivers/common/sfc_efx/base/rhead_rx.c |  3 +++
 4 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/drivers/common/sfc_efx/base/ef10_rx.c 
b/drivers/common/sfc_efx/base/ef10_rx.c
index cfa60bd324..0e140645a5 100644
--- a/drivers/common/sfc_efx/base/ef10_rx.c
+++ b/drivers/common/sfc_efx/base/ef10_rx.c
@@ -1031,6 +1031,11 @@ ef10_rx_qcreate(
EFSYS_ASSERT(params.es_bufs_per_desc == 0);
 #endif /* EFSYS_OPT_RX_ES_SUPER_BUFFER */
 
+   if (flags & EFX_RXQ_FLAG_INGRESS_MPORT) {
+   rc = ENOTSUP;
+   goto fail12;
+   }
+
/* Scatter can only be disabled if the firmware supports doing so */
if (flags & EFX_RXQ_FLAG_SCATTER)
params.disable_scatter = B_FALSE;
@@ -1044,7 +1049,7 @@ ef10_rx_qcreate(
 
if ((rc = efx_mcdi_init_rxq(enp, ndescs, eep, label, index,
esmp, ¶ms)) != 0)
-   goto fail12;
+   goto fail13;
 
erp->er_eep = eep;
erp->er_label = label;
@@ -1057,6 +1062,8 @@ ef10_rx_qcreate(
 
return (0);
 
+fail13:
+   EFSYS_PROBE(fail13);
 fail12:
EFSYS_PROBE(fail12);
 #if EFSYS_OPT_RX_ES_SUPER_BUFFER
diff --git a/drivers/common/sfc_efx/base/efx.h 
b/drivers/common/sfc_efx/base/efx.h
index 6a99099ad2..72ab4af01c 100644
--- a/drivers/common/sfc_efx/base/efx.h
+++ b/drivers/common/sfc_efx/base/efx.h
@@ -2925,6 +2925,7 @@ typedef enum efx_rx_prefix_field_e {
EFX_RX_PREFIX_FIELD_USER_MARK_VALID,
EFX_RX_PREFIX_FIELD_CSUM_FRAME,
EFX_RX_PREFIX_FIELD_INGRESS_VPORT,
+   EFX_RX_PREFIX_FIELD_INGRESS_MPORT = EFX_RX_PREFIX_FIELD_INGRESS_VPORT,
EFX_RX_PREFIX_NFIELDS
 } efx_rx_prefix_field_t;
 
@@ -2998,6 +2999,10 @@ typedef enum efx_rxq_type_e {
  * the driver.
  */
 #defineEFX_RXQ_FLAG_RSS_HASH   0x4
+/*
+ * Request ingress mport field in the Rx prefix of a queue.
+ */
+#defineEFX_RXQ_FLAG_INGRESS_MPORT  0x8
 
 LIBEFX_API
 extern __checkReturn   efx_rc_t
diff --git a/drivers/common/sfc_efx/base/efx_rx.c 
b/drivers/common/sfc_efx/base/efx_rx.c
index 7c6fecf925..7e63363be7 100644
--- a/drivers/common/sfc_efx/base/efx_rx.c
+++ b/drivers/common/sfc_efx/base/efx_rx.c
@@ -1743,14 +1743,20 @@ siena_rx_qcreate(
goto fail2;
}
 
-   if (flags & EFX_RXQ_FLAG_SCATTER) {
 #if EFSYS_OPT_RX_SCATTER
-   jumbo = B_TRUE;
+#define SUPPORTED_RXQ_FLAGS EFX_RXQ_FLAG_SCATTER
 #else
+#define SUPPORTED_RXQ_FLAGS EFX_RXQ_FLAG_NONE
+#endif
+   /* Reject flags for unsupported queue features */
+   if ((flags & ~SUPPORTED_RXQ_FLAGS) != 0) {
rc = EINVAL;
goto fail3;
-#endif /* EFSYS_OPT_RX_SCATTER */
}
+#undef SUPPORTED_RXQ_FLAGS
+
+   if (flags & EFX_RXQ_FLAG_SCATTER)
+   jumbo = B_TRUE;
 
/* Set up the new descriptor queue */
EFX_POPULATE_OWORD_7(oword,
@@ -1769,10 +1775,8 @@ siena_rx_qcreate(
 
return (0);
 
-#if !EFSYS_OPT_RX_SCATTER
 fail3:
EFSYS_PROBE(fail3);
-#endif
 fail2:
EFSYS_PROBE(fail2);
 fail1:
diff --git a/drivers/common/sfc_efx/base/rhead_rx.c 
b/drivers/common/sfc_efx/base/rhead_rx.c
index b2dacbab32..f1d46f7c70 100644
--- a/drivers/common/sfc_efx/base/rhead_rx.c
+++ b/drivers/common/sfc_efx/base/rhead_rx.c
@@ -629,6 +629,9 @@ rhead_rx_qcreate(
fields_mask |= 1U << EFX_RX_PREFIX_FIELD_RSS_HASH_VALID;
}
 
+   if (flags & EFX_RXQ_FLAG_INGRESS_MPORT)
+   fields_mask |= 1U << EFX_RX_PREFIX_FIELD_INGRESS_MPORT;
+
/*
 * LENGTH is required in EF100 host interface, as receive events
 * do not include the packet length.
-- 
2.30.2



[dpdk-dev] [PATCH v2 09/20] net/sfc: add abstractions for the management EVQ identity

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

Add a function returning management event queue software index.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/sfc_ev.c | 2 +-
 drivers/net/sfc/sfc_ev.h | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/sfc/sfc_ev.c b/drivers/net/sfc/sfc_ev.c
index ed28d51e12..ba4409369a 100644
--- a/drivers/net/sfc/sfc_ev.c
+++ b/drivers/net/sfc/sfc_ev.c
@@ -983,7 +983,7 @@ sfc_ev_attach(struct sfc_adapter *sa)
goto fail_kvarg_perf_profile;
}
 
-   sa->mgmt_evq_index = 0;
+   sa->mgmt_evq_index = sfc_mgmt_evq_sw_index(sfc_sa2shared(sa));
rte_spinlock_init(&sa->mgmt_evq_lock);
 
rc = sfc_ev_qinit(sa, SFC_EVQ_TYPE_MGMT, 0, sa->evq_min_entries,
diff --git a/drivers/net/sfc/sfc_ev.h b/drivers/net/sfc/sfc_ev.h
index 75b9dcdebd..3f3c4b5b9a 100644
--- a/drivers/net/sfc/sfc_ev.h
+++ b/drivers/net/sfc/sfc_ev.h
@@ -60,6 +60,12 @@ struct sfc_evq {
unsigned intentries;
 };
 
+static inline sfc_sw_index_t
+sfc_mgmt_evq_sw_index(__rte_unused const struct sfc_adapter_shared *sas)
+{
+   return 0;
+}
+
 /*
  * Functions below define event queue to transmit/receive queue and vice
  * versa mapping.
-- 
2.30.2



[dpdk-dev] [PATCH v2 11/20] net/sfc: add NUMA-aware registry of service logical cores

2021-06-04 Thread Andrew Rybchenko
The driver requires service cores for housekeeping. Share these
cores for many adapters and various purposes to avoid extra CPU
overhead.

Since housekeeping services will talk to NIC, it should be possible
to choose logical core on matching NUMA node.

Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/net/sfc/meson.build   |  1 +
 drivers/net/sfc/sfc_service.c | 99 +++
 drivers/net/sfc/sfc_service.h | 20 +++
 3 files changed, 120 insertions(+)
 create mode 100644 drivers/net/sfc/sfc_service.c
 create mode 100644 drivers/net/sfc/sfc_service.h

diff --git a/drivers/net/sfc/meson.build b/drivers/net/sfc/meson.build
index ccf5984d87..4ac97e8d43 100644
--- a/drivers/net/sfc/meson.build
+++ b/drivers/net/sfc/meson.build
@@ -62,4 +62,5 @@ sources = files(
 'sfc_ef10_tx.c',
 'sfc_ef100_rx.c',
 'sfc_ef100_tx.c',
+'sfc_service.c',
 )
diff --git a/drivers/net/sfc/sfc_service.c b/drivers/net/sfc/sfc_service.c
new file mode 100644
index 00..9c89484406
--- /dev/null
+++ b/drivers/net/sfc/sfc_service.c
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright(c) 2020-2021 Xilinx, Inc.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "sfc_log.h"
+#include "sfc_service.h"
+#include "sfc_debug.h"
+
+static uint32_t sfc_service_lcore[RTE_MAX_NUMA_NODES];
+static rte_spinlock_t sfc_service_lcore_lock = RTE_SPINLOCK_INITIALIZER;
+
+RTE_INIT(sfc_service_lcore_init)
+{
+   size_t i;
+
+   for (i = 0; i < RTE_DIM(sfc_service_lcore); ++i)
+   sfc_service_lcore[i] = RTE_MAX_LCORE;
+}
+
+static uint32_t
+sfc_find_service_lcore(int *socket_id)
+{
+   uint32_t service_core_list[RTE_MAX_LCORE];
+   uint32_t lcore_id;
+   int num;
+   int i;
+
+   SFC_ASSERT(rte_spinlock_is_locked(&sfc_service_lcore_lock));
+
+   num = rte_service_lcore_list(service_core_list,
+   RTE_DIM(service_core_list));
+   if (num == 0) {
+   SFC_GENERIC_LOG(WARNING, "No service cores available");
+   return RTE_MAX_LCORE;
+   }
+   if (num < 0) {
+   SFC_GENERIC_LOG(ERR, "Failed to get service core list");
+   return RTE_MAX_LCORE;
+   }
+
+   for (i = 0; i < num; ++i) {
+   lcore_id = service_core_list[i];
+
+   if (*socket_id == SOCKET_ID_ANY) {
+   *socket_id = rte_lcore_to_socket_id(lcore_id);
+   break;
+   } else if (rte_lcore_to_socket_id(lcore_id) ==
+  (unsigned int)*socket_id) {
+   break;
+   }
+   }
+
+   if (i == num) {
+   SFC_GENERIC_LOG(WARNING,
+   "No service cores reserved at socket %d", *socket_id);
+   return RTE_MAX_LCORE;
+   }
+
+   return lcore_id;
+}
+
+uint32_t
+sfc_get_service_lcore(int socket_id)
+{
+   uint32_t lcore_id = RTE_MAX_LCORE;
+
+   rte_spinlock_lock(&sfc_service_lcore_lock);
+
+   if (socket_id != SOCKET_ID_ANY) {
+   lcore_id = sfc_service_lcore[socket_id];
+   } else {
+   size_t i;
+
+   for (i = 0; i < RTE_DIM(sfc_service_lcore); ++i) {
+   if (sfc_service_lcore[i] != RTE_MAX_LCORE) {
+   lcore_id = sfc_service_lcore[i];
+   break;
+   }
+   }
+   }
+
+   if (lcore_id == RTE_MAX_LCORE) {
+   lcore_id = sfc_find_service_lcore(&socket_id);
+   if (lcore_id != RTE_MAX_LCORE)
+   sfc_service_lcore[socket_id] = lcore_id;
+   }
+
+   rte_spinlock_unlock(&sfc_service_lcore_lock);
+   return lcore_id;
+}
diff --git a/drivers/net/sfc/sfc_service.h b/drivers/net/sfc/sfc_service.h
new file mode 100644
index 00..bbcce28479
--- /dev/null
+++ b/drivers/net/sfc/sfc_service.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright(c) 2020-2021 Xilinx, Inc.
+ */
+
+#ifndef _SFC_SERVICE_H
+#define _SFC_SERVICE_H
+
+#include 
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+uint32_t sfc_get_service_lcore(int socket_id);
+
+#ifdef __cplusplus
+}
+#endif
+#endif  /* _SFC_SERVICE_H */
-- 
2.30.2



[dpdk-dev] [PATCH v2 13/20] common/sfc_efx/base: add counter creation MCDI wrappers

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

User will be able to create and free MAE counters. Support for
associating counters with action set will be added in upcoming
patches.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/common/sfc_efx/base/efx.h  |  37 ++
 drivers/common/sfc_efx/base/efx_impl.h |   1 +
 drivers/common/sfc_efx/base/efx_mae.c  | 158 +
 drivers/common/sfc_efx/base/efx_mcdi.h |   7 ++
 drivers/common/sfc_efx/version.map |   2 +
 5 files changed, 205 insertions(+)

diff --git a/drivers/common/sfc_efx/base/efx.h 
b/drivers/common/sfc_efx/base/efx.h
index 9bbd7cae55..d0f8bc10b3 100644
--- a/drivers/common/sfc_efx/base/efx.h
+++ b/drivers/common/sfc_efx/base/efx.h
@@ -4406,6 +4406,10 @@ efx_mae_action_set_fill_in_eh_id(
__inefx_mae_actions_t *spec,
__inconst efx_mae_eh_id_t *eh_idp);
 
+typedef struct efx_counter_s {
+   uint32_t id;
+} efx_counter_t;
+
 /* Action set ID */
 typedef struct efx_mae_aset_id_s {
uint32_t id;
@@ -4418,6 +4422,39 @@ efx_mae_action_set_alloc(
__inconst efx_mae_actions_t *spec,
__out   efx_mae_aset_id_t *aset_idp);
 
+/*
+ * Generation count has two purposes:
+ *
+ * 1) Distinguish between counter packets that belong to freed counter
+ *and the packets that belong to reallocated counter (with the same ID);
+ * 2) Make sure that all packets are received for a counter that was freed;
+ *
+ * API users should provide generation count out parameter in allocation
+ * function if counters can be reallocated and consistent counter values are
+ * required.
+ *
+ * API users that need consistent final counter values after counter
+ * deallocation or counter stream stop should provide the parameter in
+ * functions that free the counters and stop the counter stream.
+ */
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_mae_counters_alloc(
+   __inefx_nic_t *enp,
+   __inuint32_t n_counters,
+   __out   uint32_t *n_allocatedp,
+   __out_ecount(n_counters)efx_counter_t *countersp,
+   __out_opt   uint32_t *gen_countp);
+
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_mae_counters_free(
+   __inefx_nic_t *enp,
+   __inuint32_t n_counters,
+   __out   uint32_t *n_freedp,
+   __in_ecount(n_counters) const efx_counter_t *countersp,
+   __out_opt   uint32_t *gen_countp);
+
 LIBEFX_API
 extern __checkReturn   efx_rc_t
 efx_mae_action_set_free(
diff --git a/drivers/common/sfc_efx/base/efx_impl.h 
b/drivers/common/sfc_efx/base/efx_impl.h
index f891e2616e..9dbf6d450c 100644
--- a/drivers/common/sfc_efx/base/efx_impl.h
+++ b/drivers/common/sfc_efx/base/efx_impl.h
@@ -821,6 +821,7 @@ typedef struct efx_mae_s {
/** Outer rule match field capabilities. */
efx_mae_field_cap_t *em_outer_rule_field_caps;
size_t  em_outer_rule_field_caps_size;
+   uint32_tem_max_ncounters;
 } efx_mae_t;
 
 #endif /* EFSYS_OPT_MAE */
diff --git a/drivers/common/sfc_efx/base/efx_mae.c 
b/drivers/common/sfc_efx/base/efx_mae.c
index 5697488040..955f1d4353 100644
--- a/drivers/common/sfc_efx/base/efx_mae.c
+++ b/drivers/common/sfc_efx/base/efx_mae.c
@@ -67,6 +67,9 @@ efx_mae_get_capabilities(
maep->em_max_nfields =
MCDI_OUT_DWORD(req, MAE_GET_CAPS_OUT_MATCH_FIELD_COUNT);
 
+   maep->em_max_ncounters =
+   MCDI_OUT_DWORD(req, MAE_GET_CAPS_OUT_COUNTERS);
+
return (0);
 
 fail2:
@@ -2600,6 +2603,161 @@ efx_mae_action_rule_remove(
 
return (0);
 
+fail4:
+   EFSYS_PROBE(fail4);
+fail3:
+   EFSYS_PROBE(fail3);
+fail2:
+   EFSYS_PROBE(fail2);
+fail1:
+   EFSYS_PROBE1(fail1, efx_rc_t, rc);
+   return (rc);
+}
+
+   __checkReturn   efx_rc_t
+efx_mae_counters_alloc(
+   __inefx_nic_t *enp,
+   __inuint32_t n_counters,
+   __out   uint32_t *n_allocatedp,
+   __out_ecount(n_counters)efx_counter_t *countersp,
+   __out_opt   uint32_t *gen_countp)
+{
+   EFX_MCDI_DECLARE_BUF(payload,
+   MC_CMD_MAE_COUNTER_ALLOC_IN_LEN,
+   MC_CMD_MAE_COUNTER_ALLOC_OUT_LENMAX_MCDI2);
+   efx_mae_t *maep = enp->en_maep;
+   uint32_t n_allocated;
+   efx_mcdi_req_t req;
+   unsigned int i;
+   efx_rc_t rc;
+
+   if (n_counters > maep->em_max_ncounters ||
+   n_counters < MC_CMD_MAE_COUNTER_ALLOC_OUT_COUNTER_ID_MINNUM ||
+   n_counters > MC_CM

[dpdk-dev] [PATCH v2 14/20] common/sfc_efx/base: add counter stream MCDI wrappers

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

The MCDIs will be used to control counter Rx queue packet flow.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/common/sfc_efx/base/efx.h |  32 ++
 drivers/common/sfc_efx/base/efx_mae.c | 138 ++
 drivers/common/sfc_efx/version.map|   3 +
 3 files changed, 173 insertions(+)

diff --git a/drivers/common/sfc_efx/base/efx.h 
b/drivers/common/sfc_efx/base/efx.h
index d0f8bc10b3..cc173d13c6 100644
--- a/drivers/common/sfc_efx/base/efx.h
+++ b/drivers/common/sfc_efx/base/efx.h
@@ -4455,6 +4455,38 @@ efx_mae_counters_free(
__in_ecount(n_counters) const efx_counter_t *countersp,
__out_opt   uint32_t *gen_countp);
 
+/* When set, include counters with a value of zero */
+#defineEFX_MAE_COUNTERS_STREAM_IN_ZERO_SQUASH_DISABLE  (1U << 0)
+
+/*
+ * Set if credit-based flow control is used. In this case the driver
+ * must call efx_mae_counters_stream_give_credits() to notify the
+ * packetiser of descriptors written.
+ */
+#defineEFX_MAE_COUNTERS_STREAM_OUT_USES_CREDITS(1U << 0)
+
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_mae_counters_stream_start(
+   __inefx_nic_t *enp,
+   __inuint16_t rxq_id,
+   __inuint16_t packet_size,
+   __inuint32_t flags_in,
+   __out   uint32_t *flags_out);
+
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_mae_counters_stream_stop(
+   __inefx_nic_t *enp,
+   __inuint16_t rxq_id,
+   __out_opt   uint32_t *gen_countp);
+
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_mae_counters_stream_give_credits(
+   __inefx_nic_t *enp,
+   __inuint32_t n_credits);
+
 LIBEFX_API
 extern __checkReturn   efx_rc_t
 efx_mae_action_set_free(
diff --git a/drivers/common/sfc_efx/base/efx_mae.c 
b/drivers/common/sfc_efx/base/efx_mae.c
index 955f1d4353..0b3131161b 100644
--- a/drivers/common/sfc_efx/base/efx_mae.c
+++ b/drivers/common/sfc_efx/base/efx_mae.c
@@ -2766,6 +2766,144 @@ efx_mae_counters_free(
EFSYS_PROBE(fail2);
 fail1:
EFSYS_PROBE1(fail1, efx_rc_t, rc);
+
+   return (rc);
+}
+
+   __checkReturn   efx_rc_t
+efx_mae_counters_stream_start(
+   __inefx_nic_t *enp,
+   __inuint16_t rxq_id,
+   __inuint16_t packet_size,
+   __inuint32_t flags_in,
+   __out   uint32_t *flags_out)
+{
+   efx_mcdi_req_t req;
+   EFX_MCDI_DECLARE_BUF(payload, MC_CMD_MAE_COUNTERS_STREAM_START_IN_LEN,
+MC_CMD_MAE_COUNTERS_STREAM_START_OUT_LEN);
+   efx_rc_t rc;
+
+   EFX_STATIC_ASSERT(EFX_MAE_COUNTERS_STREAM_IN_ZERO_SQUASH_DISABLE ==
+   1U << MC_CMD_MAE_COUNTERS_STREAM_START_IN_ZERO_SQUASH_DISABLE_LBN);
+
+   EFX_STATIC_ASSERT(EFX_MAE_COUNTERS_STREAM_OUT_USES_CREDITS ==
+   1U << MC_CMD_MAE_COUNTERS_STREAM_START_OUT_USES_CREDITS_LBN);
+
+   req.emr_cmd = MC_CMD_MAE_COUNTERS_STREAM_START;
+   req.emr_in_buf = payload;
+   req.emr_in_length = MC_CMD_MAE_COUNTERS_STREAM_START_IN_LEN;
+   req.emr_out_buf = payload;
+   req.emr_out_length = MC_CMD_MAE_COUNTERS_STREAM_START_OUT_LEN;
+
+   MCDI_IN_SET_WORD(req, MAE_COUNTERS_STREAM_START_IN_QID, rxq_id);
+   MCDI_IN_SET_WORD(req, MAE_COUNTERS_STREAM_START_IN_PACKET_SIZE,
+packet_size);
+   MCDI_IN_SET_DWORD(req, MAE_COUNTERS_STREAM_START_IN_FLAGS, flags_in);
+
+   efx_mcdi_execute(enp, &req);
+
+   if (req.emr_rc != 0) {
+   rc = req.emr_rc;
+   goto fail1;
+   }
+
+   if (req.emr_out_length_used <
+   MC_CMD_MAE_COUNTERS_STREAM_START_OUT_LEN) {
+   rc = EMSGSIZE;
+   goto fail2;
+   }
+
+   *flags_out = MCDI_OUT_DWORD(req, MAE_COUNTERS_STREAM_START_OUT_FLAGS);
+
+   return (0);
+
+fail2:
+   EFSYS_PROBE(fail2);
+fail1:
+   EFSYS_PROBE1(fail1, efx_rc_t, rc);
+
+   return (rc);
+}
+
+   __checkReturn   efx_rc_t
+efx_mae_counters_stream_stop(
+   __inefx_nic_t *enp,
+   __inuint16_t rxq_id,
+   __out_opt   uint32_t *gen_countp)
+{
+   efx_mcdi_req_t req;
+   EFX_MCDI_DECLARE_BUF(payload, MC_CMD_MAE_COUNTERS_STREAM_STOP_IN_LEN,
+MC_CMD_MAE_COUNTERS_STREAM_STOP_OUT_LEN);
+   efx_rc_t rc;
+
+   req.emr_cmd = MC_CMD_MAE_COUNTERS_STREAM_STOP;
+   req.emr_in_b

[dpdk-dev] [PATCH v2 12/20] net/sfc: reserve RxQ for counters

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

MAE delivers counters data as special packets via dedicated Rx queue.
Reserve an RxQ so that it does not interfere with ethdev Rx queues.
A routine will be added later to handle these packets.

There is no point to reserve the queue if no service cores are
available and counters cannot be used.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/net/sfc/meson.build   |   1 +
 drivers/net/sfc/sfc.c |  68 --
 drivers/net/sfc/sfc.h |  19 +++
 drivers/net/sfc/sfc_dp.h  |   2 +
 drivers/net/sfc/sfc_ev.h  |  72 --
 drivers/net/sfc/sfc_mae.c |   1 +
 drivers/net/sfc/sfc_mae_counter.c | 217 ++
 drivers/net/sfc/sfc_mae_counter.h |  44 ++
 drivers/net/sfc/sfc_rx.c  |  43 --
 9 files changed, 438 insertions(+), 29 deletions(-)
 create mode 100644 drivers/net/sfc/sfc_mae_counter.c
 create mode 100644 drivers/net/sfc/sfc_mae_counter.h

diff --git a/drivers/net/sfc/meson.build b/drivers/net/sfc/meson.build
index 4ac97e8d43..f8880f740a 100644
--- a/drivers/net/sfc/meson.build
+++ b/drivers/net/sfc/meson.build
@@ -55,6 +55,7 @@ sources = files(
 'sfc_filter.c',
 'sfc_switch.c',
 'sfc_mae.c',
+'sfc_mae_counter.c',
 'sfc_flow.c',
 'sfc_dp.c',
 'sfc_ef10_rx.c',
diff --git a/drivers/net/sfc/sfc.c b/drivers/net/sfc/sfc.c
index 3477c7530b..4097cf39de 100644
--- a/drivers/net/sfc/sfc.c
+++ b/drivers/net/sfc/sfc.c
@@ -20,6 +20,7 @@
 #include "sfc_log.h"
 #include "sfc_ev.h"
 #include "sfc_rx.h"
+#include "sfc_mae_counter.h"
 #include "sfc_tx.h"
 #include "sfc_kvargs.h"
 #include "sfc_tweak.h"
@@ -174,6 +175,7 @@ static int
 sfc_estimate_resource_limits(struct sfc_adapter *sa)
 {
const efx_nic_cfg_t *encp = efx_nic_cfg_get(sa->nic);
+   struct sfc_adapter_shared *sas = sfc_sa2shared(sa);
efx_drv_limits_t limits;
int rc;
uint32_t evq_allocated;
@@ -235,17 +237,53 @@ sfc_estimate_resource_limits(struct sfc_adapter *sa)
rxq_allocated = MIN(rxq_allocated, limits.edl_max_rxq_count);
txq_allocated = MIN(txq_allocated, limits.edl_max_txq_count);
 
-   /* Subtract management EVQ not used for traffic */
-   SFC_ASSERT(evq_allocated > 0);
+   /*
+* Subtract management EVQ not used for traffic
+* The resource allocation strategy is as follows:
+* - one EVQ for management
+* - one EVQ for each ethdev RXQ
+* - one EVQ for each ethdev TXQ
+* - one EVQ and one RXQ for optional MAE counters.
+*/
+   if (evq_allocated == 0) {
+   sfc_err(sa, "count of allocated EvQ is 0");
+   rc = ENOMEM;
+   goto fail_allocate_evq;
+   }
evq_allocated--;
 
-   /* Right now we use separate EVQ for Rx and Tx */
-   sa->rxq_max = MIN(rxq_allocated, evq_allocated / 2);
-   sa->txq_max = MIN(txq_allocated, evq_allocated - sa->rxq_max);
+   /*
+* Reserve absolutely required minimum.
+* Right now we use separate EVQ for Rx and Tx.
+*/
+   if (rxq_allocated > 0 && evq_allocated > 0) {
+   sa->rxq_max = 1;
+   rxq_allocated--;
+   evq_allocated--;
+   }
+   if (txq_allocated > 0 && evq_allocated > 0) {
+   sa->txq_max = 1;
+   txq_allocated--;
+   evq_allocated--;
+   }
+
+   if (sfc_mae_counter_rxq_required(sa) &&
+   rxq_allocated > 0 && evq_allocated > 0) {
+   rxq_allocated--;
+   evq_allocated--;
+   sas->counters_rxq_allocated = true;
+   } else {
+   sas->counters_rxq_allocated = false;
+   }
+
+   /* Add remaining allocated queues */
+   sa->rxq_max += MIN(rxq_allocated, evq_allocated / 2);
+   sa->txq_max += MIN(txq_allocated, evq_allocated - sa->rxq_max);
 
/* Keep NIC initialized */
return 0;
 
+fail_allocate_evq:
 fail_get_vi_pool:
efx_nic_fini(sa->nic);
 fail_nic_init:
@@ -256,14 +294,20 @@ static int
 sfc_set_drv_limits(struct sfc_adapter *sa)
 {
const struct rte_eth_dev_data *data = sa->eth_dev->data;
+   uint32_t rxq_reserved = sfc_nb_reserved_rxq(sfc_sa2shared(sa));
efx_drv_limits_t lim;
 
memset(&lim, 0, sizeof(lim));
 
-   /* Limits are strict since take into account initial estimation */
+   /*
+* Limits are strict since take into account initial estimation.
+* Resource allocation stategy is described in
+* sfc_estimate_resource_limits().
+*/
lim.edl_min_evq_count = lim.edl_max_evq_count =
-   1 + data->nb_rx_queues + data->nb_tx_queues;
-   lim.edl_min_rxq_count = lim.edl_max_rxq_count = data->nb_rx_queues;
+   1 + data->nb_rx_queues + data->nb_tx_queues + rxq_reserved;
+   lim.edl_min_rx

[dpdk-dev] [PATCH v2 15/20] common/sfc_efx/base: support counter in action set

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

User will be able to associate counter with MAE action set to
collect counter packets and bytes for a specific action set.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/common/sfc_efx/base/efx.h  |  21 
 drivers/common/sfc_efx/base/efx_impl.h |   3 +
 drivers/common/sfc_efx/base/efx_mae.c  | 133 -
 drivers/common/sfc_efx/version.map |   3 +
 4 files changed, 157 insertions(+), 3 deletions(-)

diff --git a/drivers/common/sfc_efx/base/efx.h 
b/drivers/common/sfc_efx/base/efx.h
index cc173d13c6..628e61e065 100644
--- a/drivers/common/sfc_efx/base/efx.h
+++ b/drivers/common/sfc_efx/base/efx.h
@@ -4306,6 +4306,15 @@ extern   __checkReturn   efx_rc_t
 efx_mae_action_set_populate_encap(
__inefx_mae_actions_t *spec);
 
+/*
+ * Use efx_mae_action_set_fill_in_counter_id() to set ID of a counter
+ * in the specification prior to action set allocation.
+ */
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_mae_action_set_populate_count(
+   __inefx_mae_actions_t *spec);
+
 LIBEFX_API
 extern __checkReturn   efx_rc_t
 efx_mae_action_set_populate_flag(
@@ -4410,6 +4419,18 @@ typedef struct efx_counter_s {
uint32_t id;
 } efx_counter_t;
 
+LIBEFX_API
+extern __checkReturn   unsigned int
+efx_mae_action_set_get_nb_count(
+   __inconst efx_mae_actions_t *spec);
+
+/* See description before efx_mae_action_set_populate_count(). */
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_mae_action_set_fill_in_counter_id(
+   __inefx_mae_actions_t *spec,
+   __inconst efx_counter_t *counter_idp);
+
 /* Action set ID */
 typedef struct efx_mae_aset_id_s {
uint32_t id;
diff --git a/drivers/common/sfc_efx/base/efx_impl.h 
b/drivers/common/sfc_efx/base/efx_impl.h
index 9dbf6d450c..992edbabe3 100644
--- a/drivers/common/sfc_efx/base/efx_impl.h
+++ b/drivers/common/sfc_efx/base/efx_impl.h
@@ -1734,6 +1734,7 @@ typedef enum efx_mae_action_e {
EFX_MAE_ACTION_DECAP,
EFX_MAE_ACTION_VLAN_POP,
EFX_MAE_ACTION_VLAN_PUSH,
+   EFX_MAE_ACTION_COUNT,
EFX_MAE_ACTION_ENCAP,
 
/*
@@ -1764,6 +1765,7 @@ typedef struct efx_mae_action_vlan_push_s {
 
 typedef struct efx_mae_actions_rsrc_s {
efx_mae_eh_id_t emar_eh_id;
+   efx_counter_t   emar_counter_id;
 } efx_mae_actions_rsrc_t;
 
 struct efx_mae_actions_s {
@@ -1774,6 +1776,7 @@ struct efx_mae_actions_s {
unsigned intema_n_vlan_tags_to_push;
efx_mae_action_vlan_push_t  ema_vlan_push_descs[
EFX_MAE_VLAN_PUSH_MAX_NTAGS];
+   unsigned intema_n_count_actions;
uint32_tema_mark_value;
efx_mport_sel_t ema_deliver_mport;
 
diff --git a/drivers/common/sfc_efx/base/efx_mae.c 
b/drivers/common/sfc_efx/base/efx_mae.c
index 0b3131161b..8d1294a627 100644
--- a/drivers/common/sfc_efx/base/efx_mae.c
+++ b/drivers/common/sfc_efx/base/efx_mae.c
@@ -1191,6 +1191,7 @@ efx_mae_action_set_spec_init(
}
 
spec->ema_rsrc.emar_eh_id.id = EFX_MAE_RSRC_ID_INVALID;
+   spec->ema_rsrc.emar_counter_id.id = EFX_MAE_RSRC_ID_INVALID;
 
*specp = spec;
 
@@ -1358,6 +1359,50 @@ efx_mae_action_set_add_encap(
return (rc);
 }
 
+static __checkReturn   efx_rc_t
+efx_mae_action_set_add_count(
+   __inefx_mae_actions_t *spec,
+   __insize_t arg_size,
+   __in_bcount(arg_size)   const uint8_t *arg)
+{
+   efx_rc_t rc;
+
+   EFX_STATIC_ASSERT(EFX_MAE_RSRC_ID_INVALID ==
+ MC_CMD_MAE_COUNTER_ALLOC_OUT_COUNTER_ID_NULL);
+
+   /*
+* Preparing an action set spec to update a counter requires
+* two steps: first add this action to the action spec, and then
+* add the counter ID to the spec. This allows validity checking
+* and resource allocation to be done separately.
+* Mark the counter ID as invalid in the spec to ensure that the
+* caller must also invoke efx_mae_action_set_fill_in_counter_id()
+* before action set allocation.
+*/
+   spec->ema_rsrc.emar_counter_id.id = EFX_MAE_RSRC_ID_INVALID;
+
+   /* Nothing else is supposed to take place over here. */
+   if (arg_size != 0) {
+   rc = EINVAL;
+   goto fail1;
+   }
+
+   if (arg != NULL) {
+   rc = EINVAL;
+   goto fail2;
+   }
+
+   ++(spec->ema_n_count_actions);
+
+   return (0);
+
+fail2:
+   EFSYS_PROBE(fail2);
+fail1:
+   EFSYS_PROBE1(fail1, efx_rc_t, rc);
+   return (rc);

[dpdk-dev] [PATCH v2 16/20] net/sfc: add Rx datapath method to get pushed buffers count

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

The information about the number of pushed Rx buffers is required
for counter Rx queue to know when to give credits to counter
stream.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/net/sfc/sfc_dp_rx.h|  4 
 drivers/net/sfc/sfc_ef100_rx.c | 15 +++
 drivers/net/sfc/sfc_rx.c   |  9 +
 drivers/net/sfc/sfc_rx.h   |  3 +++
 4 files changed, 31 insertions(+)

diff --git a/drivers/net/sfc/sfc_dp_rx.h b/drivers/net/sfc/sfc_dp_rx.h
index 3f6857b1ff..b6c44085ce 100644
--- a/drivers/net/sfc/sfc_dp_rx.h
+++ b/drivers/net/sfc/sfc_dp_rx.h
@@ -204,6 +204,9 @@ typedef int (sfc_dp_rx_intr_enable_t)(struct sfc_dp_rxq 
*dp_rxq);
 /** Disable Rx interrupts */
 typedef int (sfc_dp_rx_intr_disable_t)(struct sfc_dp_rxq *dp_rxq);
 
+/** Get number of pushed Rx buffers */
+typedef unsigned int (sfc_dp_rx_get_pushed_t)(struct sfc_dp_rxq *dp_rxq);
+
 /** Receive datapath definition */
 struct sfc_dp_rx {
struct sfc_dp   dp;
@@ -238,6 +241,7 @@ struct sfc_dp_rx {
sfc_dp_rx_qdesc_status_t*qdesc_status;
sfc_dp_rx_intr_enable_t *intr_enable;
sfc_dp_rx_intr_disable_t*intr_disable;
+   sfc_dp_rx_get_pushed_t  *get_pushed;
eth_rx_burst_t  pkt_burst;
 };
 
diff --git a/drivers/net/sfc/sfc_ef100_rx.c b/drivers/net/sfc/sfc_ef100_rx.c
index 8cde24c585..7447f8b9de 100644
--- a/drivers/net/sfc/sfc_ef100_rx.c
+++ b/drivers/net/sfc/sfc_ef100_rx.c
@@ -892,6 +892,20 @@ sfc_ef100_rx_intr_disable(struct sfc_dp_rxq *dp_rxq)
return 0;
 }
 
+static sfc_dp_rx_get_pushed_t sfc_ef100_rx_get_pushed;
+static unsigned int
+sfc_ef100_rx_get_pushed(struct sfc_dp_rxq *dp_rxq)
+{
+   struct sfc_ef100_rxq *rxq = sfc_ef100_rxq_by_dp_rxq(dp_rxq);
+
+   /*
+* The datapath keeps track only of added descriptors, since
+* the number of pushed descriptors always equals the number
+* of added descriptors due to enforced alignment.
+*/
+   return rxq->added;
+}
+
 struct sfc_dp_rx sfc_ef100_rx = {
.dp = {
.name   = SFC_KVARG_DATAPATH_EF100,
@@ -919,5 +933,6 @@ struct sfc_dp_rx sfc_ef100_rx = {
.qdesc_status   = sfc_ef100_rx_qdesc_status,
.intr_enable= sfc_ef100_rx_intr_enable,
.intr_disable   = sfc_ef100_rx_intr_disable,
+   .get_pushed = sfc_ef100_rx_get_pushed,
.pkt_burst  = sfc_ef100_recv_pkts,
 };
diff --git a/drivers/net/sfc/sfc_rx.c b/drivers/net/sfc/sfc_rx.c
index 0532f77082..f6a8ac68e8 100644
--- a/drivers/net/sfc/sfc_rx.c
+++ b/drivers/net/sfc/sfc_rx.c
@@ -53,6 +53,15 @@ sfc_rx_qflush_failed(struct sfc_rxq_info *rxq_info)
rxq_info->state &= ~SFC_RXQ_FLUSHING;
 }
 
+/* This returns the running counter, which is not bounded by ring size */
+unsigned int
+sfc_rx_get_pushed(struct sfc_adapter *sa, struct sfc_dp_rxq *dp_rxq)
+{
+   SFC_ASSERT(sa->priv.dp_rx->get_pushed != NULL);
+
+   return sa->priv.dp_rx->get_pushed(dp_rxq);
+}
+
 static int
 sfc_efx_rx_qprime(struct sfc_efx_rxq *rxq)
 {
diff --git a/drivers/net/sfc/sfc_rx.h b/drivers/net/sfc/sfc_rx.h
index e5a6fde79b..4ab513915e 100644
--- a/drivers/net/sfc/sfc_rx.h
+++ b/drivers/net/sfc/sfc_rx.h
@@ -145,6 +145,9 @@ uint64_t sfc_rx_get_queue_offload_caps(struct sfc_adapter 
*sa);
 void sfc_rx_qflush_done(struct sfc_rxq_info *rxq_info);
 void sfc_rx_qflush_failed(struct sfc_rxq_info *rxq_info);
 
+unsigned int sfc_rx_get_pushed(struct sfc_adapter *sa,
+  struct sfc_dp_rxq *dp_rxq);
+
 int sfc_rx_hash_init(struct sfc_adapter *sa);
 void sfc_rx_hash_fini(struct sfc_adapter *sa);
 int sfc_rx_hf_rte_to_efx(struct sfc_adapter *sa, uint64_t rte,
-- 
2.30.2



[dpdk-dev] [PATCH v2 17/20] common/sfc_efx/base: add max MAE counters to limits

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

The information about the maximum number of MAE counters is
crucial to the counter support in the driver.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/common/sfc_efx/base/efx.h | 1 +
 drivers/common/sfc_efx/base/efx_mae.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/common/sfc_efx/base/efx.h 
b/drivers/common/sfc_efx/base/efx.h
index 628e61e065..b2301b845a 100644
--- a/drivers/common/sfc_efx/base/efx.h
+++ b/drivers/common/sfc_efx/base/efx.h
@@ -4093,6 +4093,7 @@ typedef struct efx_mae_limits_s {
uint32_teml_max_n_outer_prios;
uint32_teml_encap_types_supported;
uint32_teml_encap_header_size_limit;
+   uint32_teml_max_n_counters;
 } efx_mae_limits_t;
 
 LIBEFX_API
diff --git a/drivers/common/sfc_efx/base/efx_mae.c 
b/drivers/common/sfc_efx/base/efx_mae.c
index 8d1294a627..5a320dcda6 100644
--- a/drivers/common/sfc_efx/base/efx_mae.c
+++ b/drivers/common/sfc_efx/base/efx_mae.c
@@ -374,6 +374,7 @@ efx_mae_get_limits(
emlp->eml_encap_types_supported = maep->em_encap_types_supported;
emlp->eml_encap_header_size_limit =
MC_CMD_MAE_ENCAP_HEADER_ALLOC_IN_HDR_DATA_MAXNUM_MCDI2;
+   emlp->eml_max_n_counters = maep->em_max_ncounters;
 
return (0);
 
-- 
2.30.2



[dpdk-dev] [PATCH v2 18/20] common/sfc_efx/base: add packetiser packet format definition

2021-06-04 Thread Andrew Rybchenko
Packetiser composes packets with MAE counters update.

Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 .../base/efx_regs_counters_pkt_format.h   | 87 +++
 1 file changed, 87 insertions(+)
 create mode 100644 drivers/common/sfc_efx/base/efx_regs_counters_pkt_format.h

diff --git a/drivers/common/sfc_efx/base/efx_regs_counters_pkt_format.h 
b/drivers/common/sfc_efx/base/efx_regs_counters_pkt_format.h
new file mode 100644
index 00..6610d07dc0
--- /dev/null
+++ b/drivers/common/sfc_efx/base/efx_regs_counters_pkt_format.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright(c) 2020-2021 Xilinx, Inc.
+ */
+
+#ifndef_SYS_EFX_REGS_COUNTERS_PKT_FORMAT_H
+#define_SYS_EFX_REGS_COUNTERS_PKT_FORMAT_H
+
+/*
+ * Packetiser packet format definition.
+ * SF-122415-TC - OVS Counter Design Specification section 7
+ * Primary copy of the header is located in the smartnic_registry repo:
+ * src/ovs_counter/packetiser_packet_format.h
+ */
+
+/**/
+/*
+ * ER_RX_SL_PACKETISER_HEADER_WORD(160bit):
+ *
+ */
+#defineER_RX_SL_PACKETISER_HEADER_WORD_SIZE 20
+
+#defineERF_SC_PACKETISER_HEADER_VERSION_LBN 0
+#defineERF_SC_PACKETISER_HEADER_VERSION_WIDTH 8
+/* Deprecated, use ERF_SC_PACKETISER_HEADER_VERSION_2 instead */
+#defineERF_SC_PACKETISER_HEADER_VERSION_VALUE 2
+#defineERF_SC_PACKETISER_HEADER_VERSION_2 2
+#defineERF_SC_PACKETISER_HEADER_IDENTIFIER_LBN 8
+#defineERF_SC_PACKETISER_HEADER_IDENTIFIER_WIDTH 8
+#defineERF_SC_PACKETISER_HEADER_IDENTIFIER_AR 0
+#defineERF_SC_PACKETISER_HEADER_IDENTIFIER_CT 1
+#defineERF_SC_PACKETISER_HEADER_HEADER_OFFSET_LBN 16
+#defineERF_SC_PACKETISER_HEADER_HEADER_OFFSET_WIDTH 8
+#defineERF_SC_PACKETISER_HEADER_HEADER_OFFSET_DEFAULT 0x4
+#defineERF_SC_PACKETISER_HEADER_PAYLOAD_OFFSET_LBN 24
+#defineERF_SC_PACKETISER_HEADER_PAYLOAD_OFFSET_WIDTH 8
+#defineERF_SC_PACKETISER_HEADER_PAYLOAD_OFFSET_DEFAULT 0x14
+#defineERF_SC_PACKETISER_HEADER_INDEX_LBN 32
+#defineERF_SC_PACKETISER_HEADER_INDEX_WIDTH 16
+#defineERF_SC_PACKETISER_HEADER_COUNT_LBN 48
+#defineERF_SC_PACKETISER_HEADER_COUNT_WIDTH 16
+#defineERF_SC_PACKETISER_HEADER_RESERVED_0_LBN 64
+#defineERF_SC_PACKETISER_HEADER_RESERVED_0_WIDTH 32
+#defineERF_SC_PACKETISER_HEADER_RESERVED_1_LBN 96
+#defineERF_SC_PACKETISER_HEADER_RESERVED_1_WIDTH 32
+#defineERF_SC_PACKETISER_HEADER_RESERVED_2_LBN 128
+#defineERF_SC_PACKETISER_HEADER_RESERVED_2_WIDTH 32
+
+
+/**/
+/*
+ * ER_RX_SL_PACKETISER_PAYLOAD_WORD(128bit):
+ *
+ */
+#defineER_RX_SL_PACKETISER_PAYLOAD_WORD_SIZE 16
+
+#defineERF_SC_PACKETISER_PAYLOAD_COUNTER_INDEX_LBN 0
+#defineERF_SC_PACKETISER_PAYLOAD_COUNTER_INDEX_WIDTH 24
+#defineERF_SC_PACKETISER_PAYLOAD_RESERVED_LBN 24
+#defineERF_SC_PACKETISER_PAYLOAD_RESERVED_WIDTH 8
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_OFST 4
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_SIZE 6
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_LBN 32
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_WIDTH 48
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_LO_OFST 4
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_LO_SIZE 4
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_LO_LBN 32
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_LO_WIDTH 32
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_HI_OFST 8
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_HI_SIZE 2
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_HI_LBN 64
+#defineERF_SC_PACKETISER_PAYLOAD_PACKET_COUNT_HI_WIDTH 16
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_OFST 10
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_SIZE 6
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_LBN 80
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_WIDTH 48
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_LO_OFST 10
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_LO_SIZE 2
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_LO_LBN 80
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_LO_WIDTH 16
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_HI_OFST 12
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_HI_SIZE 4
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_HI_LBN 96
+#defineERF_SC_PACKETISER_PAYLOAD_BYTE_COUNT_HI_WIDTH 32
+
+
+#endif /* _SYS_EFX_REGS_COUNTERS_PKT_FORMAT_H */
-- 
2.30.2



[dpdk-dev] [PATCH v2 20/20] net/sfc: support flow API query for count actions

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

The query reports the number of hits for a counter associated
with a flow rule.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 drivers/net/sfc/sfc_flow.c| 48 ++-
 drivers/net/sfc/sfc_flow.h|  6 +++
 drivers/net/sfc/sfc_mae.c | 64 +++
 drivers/net/sfc/sfc_mae.h |  1 +
 drivers/net/sfc/sfc_mae_counter.c | 32 
 drivers/net/sfc/sfc_mae_counter.h |  3 ++
 6 files changed, 153 insertions(+), 1 deletion(-)

diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index 1294dbd3a7..a3721089ca 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -32,6 +32,7 @@ struct sfc_flow_ops_by_spec {
sfc_flow_cleanup_cb_t   *cleanup;
sfc_flow_insert_cb_t*insert;
sfc_flow_remove_cb_t*remove;
+   sfc_flow_query_cb_t *query;
 };
 
 static sfc_flow_parse_cb_t sfc_flow_parse_rte_to_filter;
@@ -45,6 +46,7 @@ static const struct sfc_flow_ops_by_spec sfc_flow_ops_filter 
= {
.cleanup = NULL,
.insert = sfc_flow_filter_insert,
.remove = sfc_flow_filter_remove,
+   .query = NULL,
 };
 
 static const struct sfc_flow_ops_by_spec sfc_flow_ops_mae = {
@@ -53,6 +55,7 @@ static const struct sfc_flow_ops_by_spec sfc_flow_ops_mae = {
.cleanup = sfc_mae_flow_cleanup,
.insert = sfc_mae_flow_insert,
.remove = sfc_mae_flow_remove,
+   .query = sfc_mae_flow_query,
 };
 
 static const struct sfc_flow_ops_by_spec *
@@ -2788,6 +2791,49 @@ sfc_flow_flush(struct rte_eth_dev *dev,
return -ret;
 }
 
+static int
+sfc_flow_query(struct rte_eth_dev *dev,
+  struct rte_flow *flow,
+  const struct rte_flow_action *action,
+  void *data,
+  struct rte_flow_error *error)
+{
+   struct sfc_adapter *sa = sfc_adapter_by_eth_dev(dev);
+   const struct sfc_flow_ops_by_spec *ops;
+   int ret;
+
+   sfc_adapter_lock(sa);
+
+   ops = sfc_flow_get_ops_by_spec(flow);
+   if (ops == NULL || ops->query == NULL) {
+   ret = rte_flow_error_set(error, ENOTSUP,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+   "No backend to handle this flow");
+   goto fail_no_backend;
+   }
+
+   if (sa->state != SFC_ETHDEV_STARTED) {
+   ret = rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+   "Can't query the flow: the adapter is not started");
+   goto fail_not_started;
+   }
+
+   ret = ops->query(dev, flow, action, data, error);
+   if (ret != 0)
+   goto fail_query;
+
+   sfc_adapter_unlock(sa);
+
+   return 0;
+
+fail_query:
+fail_not_started:
+fail_no_backend:
+   sfc_adapter_unlock(sa);
+   return ret;
+}
+
 static int
 sfc_flow_isolate(struct rte_eth_dev *dev, int enable,
 struct rte_flow_error *error)
@@ -2814,7 +2860,7 @@ const struct rte_flow_ops sfc_flow_ops = {
.create = sfc_flow_create,
.destroy = sfc_flow_destroy,
.flush = sfc_flow_flush,
-   .query = NULL,
+   .query = sfc_flow_query,
.isolate = sfc_flow_isolate,
 };
 
diff --git a/drivers/net/sfc/sfc_flow.h b/drivers/net/sfc/sfc_flow.h
index bd3b374d68..99e5cf9cff 100644
--- a/drivers/net/sfc/sfc_flow.h
+++ b/drivers/net/sfc/sfc_flow.h
@@ -181,6 +181,12 @@ typedef int (sfc_flow_insert_cb_t)(struct sfc_adapter *sa,
 typedef int (sfc_flow_remove_cb_t)(struct sfc_adapter *sa,
   struct rte_flow *flow);
 
+typedef int (sfc_flow_query_cb_t)(struct rte_eth_dev *dev,
+ struct rte_flow *flow,
+ const struct rte_flow_action *action,
+ void *data,
+ struct rte_flow_error *error);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/net/sfc/sfc_mae.c b/drivers/net/sfc/sfc_mae.c
index 370a39da1d..ee1188bc1e 100644
--- a/drivers/net/sfc/sfc_mae.c
+++ b/drivers/net/sfc/sfc_mae.c
@@ -3277,3 +3277,67 @@ sfc_mae_flow_remove(struct sfc_adapter *sa,
 
return 0;
 }
+
+static int
+sfc_mae_query_counter(struct sfc_adapter *sa,
+ struct sfc_flow_spec_mae *spec,
+ const struct rte_flow_action *action,
+ struct rte_flow_query_count *data,
+ struct rte_flow_error *error)
+{
+   struct sfc_mae_action_set *action_set = spec->action_set;
+   const struct rte_flow_action_count *conf = action->conf;
+   unsigned int i;
+   int rc;
+
+   if (action_set->n_counters == 0) {
+   return rte_flow_error_set(error, EINVAL,
+   RTE_FLOW_ERROR_TYPE_ACTION, action,
+   "Queried flow rule does not have co

[dpdk-dev] [PATCH v2 19/20] net/sfc: support flow action COUNT in transfer rules

2021-06-04 Thread Andrew Rybchenko
From: Igor Romanov 

For now, a rule may have only one dedicated counter, shared counters
are not supported.

HW delivers (or "streams") counter readings using special packets.
The driver creates a dedicated Rx queue to receive such packets
and requests that HW start "streaming" the readings to it.

The counter queue is polled periodically, and the first available
service core is used for that. Hence, the user has to specify at least
one service core for counters to work. Such a core is shared by all
MAE-capable devices managed by sfc driver.

Signed-off-by: Igor Romanov 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
Reviewed-by: Ivan Malov 
---
 doc/guides/nics/sfc_efx.rst|   2 +
 doc/guides/rel_notes/release_21_08.rst |   6 +
 drivers/net/sfc/meson.build|  10 +
 drivers/net/sfc/sfc_flow.c |   7 +
 drivers/net/sfc/sfc_mae.c  | 231 +-
 drivers/net/sfc/sfc_mae.h  |  60 +++
 drivers/net/sfc/sfc_mae_counter.c  | 578 +
 drivers/net/sfc/sfc_mae_counter.h  |  11 +
 drivers/net/sfc/sfc_stats.h|  80 
 drivers/net/sfc/sfc_tweak.h|   9 +
 10 files changed, 989 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/sfc/sfc_stats.h

diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst
index cf1269cc03..bd08118da7 100644
--- a/doc/guides/nics/sfc_efx.rst
+++ b/doc/guides/nics/sfc_efx.rst
@@ -240,6 +240,8 @@ Supported actions (***transfer*** rules):
 
 - PORT_ID
 
+- COUNT
+
 - DROP
 
 Validating flow rules depends on the firmware variant.
diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3ce..75688304da 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -55,6 +55,12 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Updated Solarflare network PMD.**
+
+  Updated the Solarflare ``sfc_efx`` driver with changes including:
+
+  * Added COUNT action support for SN1000 NICs
+
 
 Removed Items
 -
diff --git a/drivers/net/sfc/meson.build b/drivers/net/sfc/meson.build
index f8880f740a..32b58e3d76 100644
--- a/drivers/net/sfc/meson.build
+++ b/drivers/net/sfc/meson.build
@@ -39,6 +39,16 @@ foreach flag: extra_flags
 endif
 endforeach
 
+# for clang 32-bit compiles we need libatomic for 64-bit atomic ops
+if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
+ext_deps += cc.find_library('atomic')
+endif
+
+# for gcc compiles we need -latomic for 128-bit atomic ops
+if cc.get_id() == 'gcc'
+ext_deps += cc.find_library('atomic')
+endif
+
 deps += ['common_sfc_efx', 'bus_pci']
 sources = files(
 'sfc_ethdev.c',
diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index 2db8af1759..1294dbd3a7 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -24,6 +24,7 @@
 #include "sfc_flow.h"
 #include "sfc_log.h"
 #include "sfc_dp_rx.h"
+#include "sfc_mae_counter.h"
 
 struct sfc_flow_ops_by_spec {
sfc_flow_parse_cb_t *parse;
@@ -2854,6 +2855,12 @@ sfc_flow_stop(struct sfc_adapter *sa)
efx_rx_scale_context_free(sa->nic, rss->dummy_rss_context);
rss->dummy_rss_context = EFX_RSS_CONTEXT_DEFAULT;
}
+
+   /*
+* MAE counter service is not stopped on flow rule remove to avoid
+* extra work. Make sure that it is stopped here.
+*/
+   sfc_mae_counter_stop(sa);
 }
 
 int
diff --git a/drivers/net/sfc/sfc_mae.c b/drivers/net/sfc/sfc_mae.c
index e603ffbdb4..370a39da1d 100644
--- a/drivers/net/sfc/sfc_mae.c
+++ b/drivers/net/sfc/sfc_mae.c
@@ -19,6 +19,7 @@
 #include "sfc_mae_counter.h"
 #include "sfc_log.h"
 #include "sfc_switch.h"
+#include "sfc_service.h"
 
 static int
 sfc_mae_assign_entity_mport(struct sfc_adapter *sa,
@@ -30,6 +31,19 @@ sfc_mae_assign_entity_mport(struct sfc_adapter *sa,
  mportp);
 }
 
+static int
+sfc_mae_counter_registry_init(struct sfc_mae_counter_registry *registry,
+ uint32_t nb_counters_max)
+{
+   return sfc_mae_counters_init(®istry->counters, nb_counters_max);
+}
+
+static void
+sfc_mae_counter_registry_fini(struct sfc_mae_counter_registry *registry)
+{
+   sfc_mae_counters_fini(®istry->counters);
+}
+
 int
 sfc_mae_attach(struct sfc_adapter *sa)
 {
@@ -59,6 +73,15 @@ sfc_mae_attach(struct sfc_adapter *sa)
if (rc != 0)
goto fail_mae_get_limits;
 
+   sfc_log_init(sa, "init MAE counter registry");
+   rc = sfc_mae_counter_registry_init(&mae->counter_registry,
+  limits.eml_max_n_counters);
+   if (rc != 0) {
+   sfc_err(sa, "failed to init MAE counters registry for %u 
entries: %s",
+   limits.eml_max_n_counters, r

[dpdk-dev] [PATCH 00/11] net/sfc: provide Rx/Tx doorbells stats

2021-06-04 Thread Andrew Rybchenko
Rx/Tx doorbells stats are essential for performance investigation.

On the way fix ethdev documenation to refine requirements on
driver callback. It allows to make these callbacks a bit simpler.

Add testpmd option to show specified xstats periodically or upon
request, for example:

 * --display-xstats rx_good_packets,tx_good_packets --stats-period 1

 Port statistics 
   NIC statistics for port 0  
  RX-packets: 14102808   RX-missed: 0  RX-bytes:  7164239264
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 14102789   TX-errors: 0  TX-bytes:  7164226028

  Throughput (since last show)
  Rx-pps:  2349577  Rx-bps:   9548682392
  Tx-pps:  2349576  Tx-bps:   9548682408

  ValueRate (since last show)
  rx_good_packets 14103280 2349575
  tx_good_packets 14103626 2349573
  

 * -i --display-xstats tx_good_packets,vadapter_rx_overflow

testpmd> port start 0
...
No xstat 'vadapter_rx_overflow' on port 0 - skip it
...
testpmd> start tx_first
testpmd> show port stats all
   ValueRate (since last show)
  tx_good_packets 132545336 1420439

net/sfc part of the patch series should be applied on top of [1].

[1] https://patches.dpdk.org/project/dpdk/list/?series=17238

Ivan Ilchenko (11):
  net/sfc: fix get xstats by ID callback to use MAC stats lock
  net/sfc: fix reading adapter state without locking
  ethdev: fix docs of functions getting xstats by IDs
  ethdev: fix docs of drivers callbacks getting xstats by IDs
  net/sfc: fix xstats by ID callbacks according to ethdev
  net/sfc: fix accessing xstats by an unsorted list of IDs
  net/sfc: fix MAC stats update to work for stopped device
  net/sfc: simplify getting of available xstats case
  net/sfc: prepare to add more xstats
  net/sfc: add xstats for Rx/Tx doorbells
  app/testpmd: add option to display extended statistics

 app/test-pmd/cmdline.c|  56 +++
 app/test-pmd/config.c |  66 +++
 app/test-pmd/parameters.c |  18 +
 app/test-pmd/testpmd.c| 122 ++
 app/test-pmd/testpmd.h|  21 +
 doc/guides/testpmd_app_ug/run_app.rst |   5 +
 drivers/net/sfc/meson.build   |   1 +
 drivers/net/sfc/sfc.c |  16 +
 drivers/net/sfc/sfc.h |  18 +-
 drivers/net/sfc/sfc_dp.h  |  10 +
 drivers/net/sfc/sfc_ef10.h|   3 +-
 drivers/net/sfc/sfc_ef100_rx.c|   1 +
 drivers/net/sfc/sfc_ef100_tx.c|   1 +
 drivers/net/sfc/sfc_ef10_essb_rx.c|   3 +-
 drivers/net/sfc/sfc_ef10_rx.c |   3 +-
 drivers/net/sfc/sfc_ef10_tx.c |   1 +
 drivers/net/sfc/sfc_ethdev.c  | 185 +
 drivers/net/sfc/sfc_port.c| 127 +-
 drivers/net/sfc/sfc_rx.c  |   1 +
 drivers/net/sfc/sfc_sw_stats.c| 572 ++
 drivers/net/sfc/sfc_sw_stats.h|  49 +++
 drivers/net/sfc/sfc_tx.c  |   4 +-
 lib/ethdev/ethdev_driver.h|  43 +-
 lib/ethdev/rte_ethdev.h   |  23 +-
 24 files changed, 1243 insertions(+), 106 deletions(-)
 create mode 100644 drivers/net/sfc/sfc_sw_stats.c
 create mode 100644 drivers/net/sfc/sfc_sw_stats.h

-- 
2.30.2



[dpdk-dev] [PATCH 01/11] net/sfc: fix get xstats by ID callback to use MAC stats lock

2021-06-04 Thread Andrew Rybchenko
From: Ivan Ilchenko 

Add MAC stats lock in get xstats by id callback before reading
number of supported MAC stats.

Fixes: 73280c1e4ff ("net/sfc: support xstats retrieval by ID")
Cc: sta...@dpdk.org

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/sfc_ethdev.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index 88896db1f8..d4ac61ff76 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -789,12 +789,14 @@ sfc_xstats_get_by_id(struct rte_eth_dev *dev, const 
uint64_t *ids,
int ret;
int rc;
 
-   if (unlikely(values == NULL) ||
-   unlikely((ids == NULL) && (n < port->mac_stats_nb_supported)))
-   return port->mac_stats_nb_supported;
-
rte_spinlock_lock(&port->mac_stats_lock);
 
+   if (unlikely(values == NULL) ||
+   unlikely(ids == NULL && n < port->mac_stats_nb_supported)) {
+   ret = port->mac_stats_nb_supported;
+   goto unlock;
+   }
+
rc = sfc_port_update_mac_stats(sa);
if (rc != 0) {
SFC_ASSERT(rc > 0);
-- 
2.30.2



[dpdk-dev] [PATCH 02/11] net/sfc: fix reading adapter state without locking

2021-06-04 Thread Andrew Rybchenko
From: Ivan Ilchenko 

Update MAC stats function reads adapter state with MAC stats locking
but without adapter locking. Add adapter locking before calling this
function and remove MAC stats locking since there's no point to have
it together with adapter locking. The second place MAC stats locking
is used is MAC stats reset function. It's called with adapter being
already locked so there's no point to use MAC stats locking anymore.

Fixes: 1caab2f1e68 ("net/sfc: add basic statistics")
Cc: sta...@dpdk.org

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/sfc.h|  1 -
 drivers/net/sfc/sfc_ethdev.c | 28 
 drivers/net/sfc/sfc_port.c   |  9 +++--
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/net/sfc/sfc.h b/drivers/net/sfc/sfc.h
index 546739bd4a..c7b0e5a30d 100644
--- a/drivers/net/sfc/sfc.h
+++ b/drivers/net/sfc/sfc.h
@@ -130,7 +130,6 @@ struct sfc_port {
unsigned intnb_mcast_addrs;
uint8_t *mcast_addrs;
 
-   rte_spinlock_t  mac_stats_lock;
uint64_t*mac_stats_buf;
unsigned intmac_stats_nb_supported;
efsys_mem_t mac_stats_dma_mem;
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index d4ac61ff76..d5417e5e65 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -613,7 +613,7 @@ sfc_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats 
*stats)
uint64_t *mac_stats;
int ret;
 
-   rte_spinlock_lock(&port->mac_stats_lock);
+   sfc_adapter_lock(sa);
 
ret = sfc_port_update_mac_stats(sa);
if (ret != 0)
@@ -686,7 +686,7 @@ sfc_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats 
*stats)
}
 
 unlock:
-   rte_spinlock_unlock(&port->mac_stats_lock);
+   sfc_adapter_unlock(sa);
SFC_ASSERT(ret >= 0);
return -ret;
 }
@@ -698,12 +698,15 @@ sfc_stats_reset(struct rte_eth_dev *dev)
struct sfc_port *port = &sa->port;
int rc;
 
+   sfc_adapter_lock(sa);
+
if (sa->state != SFC_ADAPTER_STARTED) {
/*
 * The operation cannot be done if port is not started; it
 * will be scheduled to be done during the next port start
 */
port->mac_stats_reset_pending = B_TRUE;
+   sfc_adapter_unlock(sa);
return 0;
}
 
@@ -711,6 +714,8 @@ sfc_stats_reset(struct rte_eth_dev *dev)
if (rc != 0)
sfc_err(sa, "failed to reset statistics (rc = %d)", rc);
 
+   sfc_adapter_unlock(sa);
+
SFC_ASSERT(rc >= 0);
return -rc;
 }
@@ -726,7 +731,7 @@ sfc_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstat *xstats,
unsigned int i;
int nstats = 0;
 
-   rte_spinlock_lock(&port->mac_stats_lock);
+   sfc_adapter_lock(sa);
 
rc = sfc_port_update_mac_stats(sa);
if (rc != 0) {
@@ -748,7 +753,7 @@ sfc_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstat *xstats,
}
 
 unlock:
-   rte_spinlock_unlock(&port->mac_stats_lock);
+   sfc_adapter_unlock(sa);
 
return nstats;
 }
@@ -789,7 +794,7 @@ sfc_xstats_get_by_id(struct rte_eth_dev *dev, const 
uint64_t *ids,
int ret;
int rc;
 
-   rte_spinlock_lock(&port->mac_stats_lock);
+   sfc_adapter_lock(sa);
 
if (unlikely(values == NULL) ||
unlikely(ids == NULL && n < port->mac_stats_nb_supported)) {
@@ -819,7 +824,7 @@ sfc_xstats_get_by_id(struct rte_eth_dev *dev, const 
uint64_t *ids,
ret = nb_written;
 
 unlock:
-   rte_spinlock_unlock(&port->mac_stats_lock);
+   sfc_adapter_unlock(sa);
 
return ret;
 }
@@ -835,9 +840,14 @@ sfc_xstats_get_names_by_id(struct rte_eth_dev *dev,
unsigned int nb_written = 0;
unsigned int i;
 
+   sfc_adapter_lock(sa);
+
if (unlikely(xstats_names == NULL) ||
-   unlikely((ids == NULL) && (size < port->mac_stats_nb_supported)))
-   return port->mac_stats_nb_supported;
+   unlikely((ids == NULL) && (size < port->mac_stats_nb_supported))) {
+   nb_supported = port->mac_stats_nb_supported;
+   sfc_adapter_unlock(sa);
+   return nb_supported;
+   }
 
for (i = 0; (i < EFX_MAC_NSTATS) && (nb_written < size); ++i) {
if (!EFX_MAC_STAT_SUPPORTED(port->mac_stats_mask, i))
@@ -853,6 +863,8 @@ sfc_xstats_get_names_by_id(struct rte_eth_dev *dev,
++nb_supported;
}
 
+   sfc_adapter_unlock(sa);
+
return nb_written;
 }
 
diff --git a/drivers/net/sfc/sfc_port.c b/drivers/net/sfc/sfc_port.c
index ac117f9c48..cdc0f94f19 100644
--- a/drivers/net/sfc/sfc_port.c
+++ b/drivers/net/sfc/sfc_port.c
@@ -43,7 +43,7 @@ sfc_port_update_mac_sta

[dpdk-dev] [PATCH 03/11] ethdev: fix docs of functions getting xstats by IDs

2021-06-04 Thread Andrew Rybchenko
From: Ivan Ilchenko 

Document valid combinations of input arguments in accordance with
current implementation in ethdev.

Fixes: 79c913a42f0 ("ethdev: retrieve xstats by ID")
Cc: sta...@dpdk.org

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 lib/ethdev/rte_ethdev.h | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index faf3bd901d..1f63118544 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -2873,12 +2873,15 @@ int rte_eth_xstats_get(uint16_t port_id, struct 
rte_eth_xstat *xstats,
  *   The port identifier of the Ethernet device.
  * @param xstats_names
  *   An rte_eth_xstat_name array of at least *size* elements to
- *   be filled. If set to NULL, the function returns the required number
- *   of elements.
+ *   be filled. Must not be NULL if @p ids are specified (not NULL).
  * @param ids
- *   IDs array given by app to retrieve specific statistics
+ *   IDs array given by app to retrieve specific statistics. May be NULL
+ *   to retrieve all available statistics.
  * @param size
- *   The size of the xstats_names array (number of elements).
+ *   If @p ids is not NULL, number of elements in the array with requested IDs
+ *   and number of elements in @p xstats_names to put names in. If @p ids is
+ *   NULL, number of elements in @p xstats_names to put all available 
statistics
+ *   names in.
  * @return
  *   - A positive value lower or equal to size: success. The return value
  * is the number of entries filled in the stats table.
@@ -2886,7 +2889,7 @@ int rte_eth_xstats_get(uint16_t port_id, struct 
rte_eth_xstat *xstats,
  * is too small. The return value corresponds to the size that should
  * be given to succeed. The entries in the table are not valid and
  * shall not be used by the caller.
- *   - A negative value on error (invalid port id).
+ *   - A negative value on error.
  */
 int
 rte_eth_xstats_get_names_by_id(uint16_t port_id,
@@ -2900,13 +2903,15 @@ rte_eth_xstats_get_names_by_id(uint16_t port_id,
  *   The port identifier of the Ethernet device.
  * @param ids
  *   A pointer to an ids array passed by application. This tells which
- *   statistics values function should retrieve. This parameter
- *   can be set to NULL if size is 0. In this case function will retrieve
+ *   statistics values function should retrieve. May be NULL to retrieve
  *   all available statistics.
  * @param values
  *   A pointer to a table to be filled with device statistics values.
+ *   Must not be NULL if ids are specified (not NULL).
  * @param size
- *   The size of the ids array (number of elements).
+ *   If @p ids is not NULL, number of elements in the array with requested IDs
+ *   and number of elements in values to put statistics in. If @p ids is NULL,
+ *   number of elements in values to put all available statistics in.
  * @return
  *   - A positive value lower or equal to size: success. The return value
  * is the number of entries filled in the stats table.
@@ -2914,7 +2919,7 @@ rte_eth_xstats_get_names_by_id(uint16_t port_id,
  * is too small. The return value corresponds to the size that should
  * be given to succeed. The entries in the table are not valid and
  * shall not be used by the caller.
- *   - A negative value on error (invalid port id).
+ *   - A negative value on error.
  */
 int rte_eth_xstats_get_by_id(uint16_t port_id, const uint64_t *ids,
 uint64_t *values, unsigned int size);
-- 
2.30.2



[dpdk-dev] [PATCH 04/11] ethdev: fix docs of drivers callbacks getting xstats by IDs

2021-06-04 Thread Andrew Rybchenko
From: Ivan Ilchenko 

Update xstats by IDs callbacks documentation in accordance with
ethdev usage of these callbacks. Document valid combinations of
input arguments to make driver implementation simpler.

Fixes: 79c913a42f0 ("ethdev: retrieve xstats by ID")
Cc: sta...@dpdk.org

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 lib/ethdev/ethdev_driver.h | 43 --
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 40e474aa7e..fd5b7ca550 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -187,11 +187,28 @@ typedef int (*eth_xstats_get_t)(struct rte_eth_dev *dev,
struct rte_eth_xstat *stats, unsigned int n);
 /**< @internal Get extended stats of an Ethernet device. */
 
+/**
+ * @internal
+ * Get extended stats of an Ethernet device.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param ids
+ *   IDs array to retrieve specific statistics. Must not be NULL.
+ * @param values
+ *   A pointer to a table to be filled with device statistics values.
+ *   Must not be NULL.
+ * @param n
+ *   Element count in @p ids and @p values
+ *
+ * @return
+ *   - A number of filled in stats.
+ *   - A negative value on error.
+ */
 typedef int (*eth_xstats_get_by_id_t)(struct rte_eth_dev *dev,
  const uint64_t *ids,
  uint64_t *values,
  unsigned int n);
-/**< @internal Get extended stats of an Ethernet device. */
 
 /**
  * @internal
@@ -218,10 +235,32 @@ typedef int (*eth_xstats_get_names_t)(struct rte_eth_dev 
*dev,
struct rte_eth_xstat_name *xstats_names, unsigned int size);
 /**< @internal Get names of extended stats of an Ethernet device. */
 
+/**
+ * @internal
+ * Get names of extended stats of an Ethernet device.
+ * For name count, set @p xstats_names and @p ids to NULL.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param xstats_names
+ *   An rte_eth_xstat_name array of at least *size* elements to
+ *   be filled. Can be NULL together with @p ids to retrieve number of
+ *   available statistics.
+ * @param ids
+ *   IDs array to retrieve specific statistics. Can be NULL together
+ *   with @p xstats_names to retrieve number of available statistics.
+ * @param size
+ *   Size of ids and xstats_names arrays.
+ *   Element count in @p ids and @p xstats_names
+ *
+ * @return
+ *   - A number of filled in stats if both xstats_names and ids are not NULL.
+ *   - A number of available stats if both xstats_names and ids are NULL.
+ *   - A negative value on error.
+ */
 typedef int (*eth_xstats_get_names_by_id_t)(struct rte_eth_dev *dev,
struct rte_eth_xstat_name *xstats_names, const uint64_t *ids,
unsigned int size);
-/**< @internal Get names of extended stats of an Ethernet device. */
 
 typedef int (*eth_queue_stats_mapping_set_t)(struct rte_eth_dev *dev,
 uint16_t queue_id,
-- 
2.30.2



[dpdk-dev] [PATCH 05/11] net/sfc: fix xstats by ID callbacks according to ethdev

2021-06-04 Thread Andrew Rybchenko
From: Ivan Ilchenko 

Fix xstats by ID callbacks according to ethdev usage.
Handle combinations of input arguments that are required by ethdev
and sanity check and reject other combinations on callback entry.

Fixes: 73280c1e4ff ("net/sfc: support xstats retrieval by ID")
Cc: sta...@dpdk.org

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/sfc_ethdev.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index d5417e5e65..fca3f524a1 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -794,13 +794,10 @@ sfc_xstats_get_by_id(struct rte_eth_dev *dev, const 
uint64_t *ids,
int ret;
int rc;
 
-   sfc_adapter_lock(sa);
+   if (unlikely(ids == NULL || values == NULL))
+   return -EINVAL;
 
-   if (unlikely(values == NULL) ||
-   unlikely(ids == NULL && n < port->mac_stats_nb_supported)) {
-   ret = port->mac_stats_nb_supported;
-   goto unlock;
-   }
+   sfc_adapter_lock(sa);
 
rc = sfc_port_update_mac_stats(sa);
if (rc != 0) {
@@ -815,7 +812,7 @@ sfc_xstats_get_by_id(struct rte_eth_dev *dev, const 
uint64_t *ids,
if (!EFX_MAC_STAT_SUPPORTED(port->mac_stats_mask, i))
continue;
 
-   if ((ids == NULL) || (ids[nb_written] == nb_supported))
+   if (ids[nb_written] == nb_supported)
values[nb_written++] = mac_stats[i];
 
++nb_supported;
@@ -840,10 +837,13 @@ sfc_xstats_get_names_by_id(struct rte_eth_dev *dev,
unsigned int nb_written = 0;
unsigned int i;
 
+   if (unlikely(xstats_names == NULL && ids != NULL) ||
+   unlikely(xstats_names != NULL && ids == NULL))
+   return -EINVAL;
+
sfc_adapter_lock(sa);
 
-   if (unlikely(xstats_names == NULL) ||
-   unlikely((ids == NULL) && (size < port->mac_stats_nb_supported))) {
+   if (unlikely(xstats_names == NULL && ids == NULL)) {
nb_supported = port->mac_stats_nb_supported;
sfc_adapter_unlock(sa);
return nb_supported;
@@ -853,7 +853,7 @@ sfc_xstats_get_names_by_id(struct rte_eth_dev *dev,
if (!EFX_MAC_STAT_SUPPORTED(port->mac_stats_mask, i))
continue;
 
-   if ((ids == NULL) || (ids[nb_written] == nb_supported)) {
+   if (ids[nb_written] == nb_supported) {
char *name = xstats_names[nb_written++].name;
 
strlcpy(name, efx_mac_stat_name(sa->nic, i),
-- 
2.30.2



[dpdk-dev] [PATCH 07/11] net/sfc: fix MAC stats update to work for stopped device

2021-06-04 Thread Andrew Rybchenko
From: Ivan Ilchenko 

Fixes: 1caab2f1e68 ("net/sfc: add basic statistics")
Cc: sta...@dpdk.org

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/sfc.h|  2 +-
 drivers/net/sfc/sfc_ethdev.c |  6 +++---
 drivers/net/sfc/sfc_port.c   | 11 +++
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/net/sfc/sfc.h b/drivers/net/sfc/sfc.h
index 972d32606d..1594f934ba 100644
--- a/drivers/net/sfc/sfc.h
+++ b/drivers/net/sfc/sfc.h
@@ -422,7 +422,7 @@ int sfc_port_start(struct sfc_adapter *sa);
 void sfc_port_stop(struct sfc_adapter *sa);
 void sfc_port_link_mode_to_info(efx_link_mode_t link_mode,
struct rte_eth_link *link_info);
-int sfc_port_update_mac_stats(struct sfc_adapter *sa);
+int sfc_port_update_mac_stats(struct sfc_adapter *sa, boolean_t manual_update);
 int sfc_port_reset_mac_stats(struct sfc_adapter *sa);
 int sfc_set_rx_mode(struct sfc_adapter *sa);
 int sfc_set_rx_mode_unchecked(struct sfc_adapter *sa);
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index ae9304f90f..bbc22723f6 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -615,7 +615,7 @@ sfc_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats 
*stats)
 
sfc_adapter_lock(sa);
 
-   ret = sfc_port_update_mac_stats(sa);
+   ret = sfc_port_update_mac_stats(sa, B_FALSE);
if (ret != 0)
goto unlock;
 
@@ -733,7 +733,7 @@ sfc_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstat *xstats,
 
sfc_adapter_lock(sa);
 
-   rc = sfc_port_update_mac_stats(sa);
+   rc = sfc_port_update_mac_stats(sa, B_FALSE);
if (rc != 0) {
SFC_ASSERT(rc > 0);
nstats = -rc;
@@ -797,7 +797,7 @@ sfc_xstats_get_by_id(struct rte_eth_dev *dev, const 
uint64_t *ids,
 
sfc_adapter_lock(sa);
 
-   rc = sfc_port_update_mac_stats(sa);
+   rc = sfc_port_update_mac_stats(sa, B_FALSE);
if (rc != 0) {
SFC_ASSERT(rc > 0);
ret = -rc;
diff --git a/drivers/net/sfc/sfc_port.c b/drivers/net/sfc/sfc_port.c
index bb9e01d96b..8c432c15f5 100644
--- a/drivers/net/sfc/sfc_port.c
+++ b/drivers/net/sfc/sfc_port.c
@@ -26,7 +26,8 @@
 /**
  * Update MAC statistics in the buffer.
  *
- * @param  sa  Adapter
+ * @param  sa  Adapter
+ * @param  force_uploadFlag to upload MAC stats in any case
  *
  * @return Status code
  * @retval 0   Success
@@ -34,7 +35,7 @@
  * @retval ENOMEM  Memory allocation failure
  */
 int
-sfc_port_update_mac_stats(struct sfc_adapter *sa)
+sfc_port_update_mac_stats(struct sfc_adapter *sa, boolean_t force_upload)
 {
struct sfc_port *port = &sa->port;
efsys_mem_t *esmp = &port->mac_stats_dma_mem;
@@ -46,14 +47,14 @@ sfc_port_update_mac_stats(struct sfc_adapter *sa)
SFC_ASSERT(sfc_adapter_is_locked(sa));
 
if (sa->state != SFC_ADAPTER_STARTED)
-   return EINVAL;
+   return 0;
 
/*
 * If periodic statistics DMA'ing is off or if not supported,
 * make a manual request and keep an eye on timer if need be
 */
if (!port->mac_stats_periodic_dma_supported ||
-   (port->mac_stats_update_period_ms == 0)) {
+   (port->mac_stats_update_period_ms == 0) || force_upload) {
if (port->mac_stats_update_period_ms != 0) {
uint64_t timestamp = sfc_get_system_msecs();
 
@@ -367,6 +368,8 @@ sfc_port_stop(struct sfc_adapter *sa)
(void)efx_mac_stats_periodic(sa->nic, &sa->port.mac_stats_dma_mem,
 0, B_FALSE);
 
+   sfc_port_update_mac_stats(sa, B_TRUE);
+
efx_port_fini(sa->nic);
efx_filter_fini(sa->nic);
 
-- 
2.30.2



[dpdk-dev] [PATCH 06/11] net/sfc: fix accessing xstats by an unsorted list of IDs

2021-06-04 Thread Andrew Rybchenko
From: Ivan Ilchenko 

Device may support only some MAC stats. Add mapping from ids to subset
of supported MAC stats for each port.

Fixes: 73280c1e4ff ("net/sfc: support xstats retrieval by ID")
Cc: sta...@dpdk.org

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/sfc.h|  2 ++
 drivers/net/sfc/sfc_ethdev.c | 44 ++--
 drivers/net/sfc/sfc_port.c   | 29 ++--
 3 files changed, 46 insertions(+), 29 deletions(-)

diff --git a/drivers/net/sfc/sfc.h b/drivers/net/sfc/sfc.h
index c7b0e5a30d..972d32606d 100644
--- a/drivers/net/sfc/sfc.h
+++ b/drivers/net/sfc/sfc.h
@@ -141,6 +141,8 @@ struct sfc_port {
 
uint32_tmac_stats_mask[EFX_MAC_STATS_MASK_NPAGES];
 
+   unsigned intmac_stats_by_id[EFX_MAC_NSTATS];
+
uint64_tipackets;
 };
 
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index fca3f524a1..ae9304f90f 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -788,8 +788,6 @@ sfc_xstats_get_by_id(struct rte_eth_dev *dev, const 
uint64_t *ids,
struct sfc_adapter *sa = sfc_adapter_by_eth_dev(dev);
struct sfc_port *port = &sa->port;
uint64_t *mac_stats;
-   unsigned int nb_supported = 0;
-   unsigned int nb_written = 0;
unsigned int i;
int ret;
int rc;
@@ -808,17 +806,19 @@ sfc_xstats_get_by_id(struct rte_eth_dev *dev, const 
uint64_t *ids,
 
mac_stats = port->mac_stats_buf;
 
-   for (i = 0; (i < EFX_MAC_NSTATS) && (nb_written < n); ++i) {
-   if (!EFX_MAC_STAT_SUPPORTED(port->mac_stats_mask, i))
-   continue;
-
-   if (ids[nb_written] == nb_supported)
-   values[nb_written++] = mac_stats[i];
+   SFC_ASSERT(port->mac_stats_nb_supported <=
+  RTE_DIM(port->mac_stats_by_id));
 
-   ++nb_supported;
+   for (i = 0; i < n; i++) {
+   if (ids[i] < port->mac_stats_nb_supported) {
+   values[i] = mac_stats[port->mac_stats_by_id[ids[i]]];
+   } else {
+   ret = i;
+   goto unlock;
+   }
}
 
-   ret = nb_written;
+   ret = n;
 
 unlock:
sfc_adapter_unlock(sa);
@@ -833,8 +833,7 @@ sfc_xstats_get_names_by_id(struct rte_eth_dev *dev,
 {
struct sfc_adapter *sa = sfc_adapter_by_eth_dev(dev);
struct sfc_port *port = &sa->port;
-   unsigned int nb_supported = 0;
-   unsigned int nb_written = 0;
+   unsigned int nb_supported;
unsigned int i;
 
if (unlikely(xstats_names == NULL && ids != NULL) ||
@@ -849,23 +848,24 @@ sfc_xstats_get_names_by_id(struct rte_eth_dev *dev,
return nb_supported;
}
 
-   for (i = 0; (i < EFX_MAC_NSTATS) && (nb_written < size); ++i) {
-   if (!EFX_MAC_STAT_SUPPORTED(port->mac_stats_mask, i))
-   continue;
-
-   if (ids[nb_written] == nb_supported) {
-   char *name = xstats_names[nb_written++].name;
+   SFC_ASSERT(port->mac_stats_nb_supported <=
+  RTE_DIM(port->mac_stats_by_id));
 
-   strlcpy(name, efx_mac_stat_name(sa->nic, i),
+   for (i = 0; i < size; i++) {
+   if (ids[i] < port->mac_stats_nb_supported) {
+   strlcpy(xstats_names[i].name,
+   efx_mac_stat_name(sa->nic,
+port->mac_stats_by_id[ids[i]]),
sizeof(xstats_names[0].name));
+   } else {
+   sfc_adapter_unlock(sa);
+   return i;
}
-
-   ++nb_supported;
}
 
sfc_adapter_unlock(sa);
 
-   return nb_written;
+   return size;
 }
 
 static int
diff --git a/drivers/net/sfc/sfc_port.c b/drivers/net/sfc/sfc_port.c
index cdc0f94f19..bb9e01d96b 100644
--- a/drivers/net/sfc/sfc_port.c
+++ b/drivers/net/sfc/sfc_port.c
@@ -157,6 +157,27 @@ sfc_port_phy_caps_to_max_link_speed(uint32_t phy_caps)
 
 #endif
 
+static void
+sfc_port_fill_mac_stats_info(struct sfc_adapter *sa)
+{
+   unsigned int mac_stats_nb_supported = 0;
+   struct sfc_port *port = &sa->port;
+   unsigned int stat_idx;
+
+   efx_mac_stats_get_mask(sa->nic, port->mac_stats_mask,
+  sizeof(port->mac_stats_mask));
+
+   for (stat_idx = 0; stat_idx < EFX_MAC_NSTATS; ++stat_idx) {
+   if (!EFX_MAC_STAT_SUPPORTED(port->mac_stats_mask, stat_idx))
+   continue;
+
+   port->mac_stats_by_id[mac_stats_nb_supported] = stat_idx;
+   mac_stats_nb_supported++;
+   }
+
+   port->mac_stats_nb_supported = mac_stats_nb_supported;
+}
+
 int
 sfc_port_start(struct 

[dpdk-dev] [PATCH 08/11] net/sfc: simplify getting of available xstats case

2021-06-04 Thread Andrew Rybchenko
From: Ivan Ilchenko 

There is no point to recalculate number of available xstats on
each request. The number is calculated once on device start
and may be returned on subsequent calls.

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/sfc_ethdev.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index bbc22723f6..f0567a71d0 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -733,6 +733,11 @@ sfc_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstat *xstats,
 
sfc_adapter_lock(sa);
 
+   if (unlikely(xstats == NULL)) {
+   nstats = port->mac_stats_nb_supported;
+   goto unlock;
+   }
+
rc = sfc_port_update_mac_stats(sa, B_FALSE);
if (rc != 0) {
SFC_ASSERT(rc > 0);
@@ -744,7 +749,7 @@ sfc_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstat *xstats,
 
for (i = 0; i < EFX_MAC_NSTATS; ++i) {
if (EFX_MAC_STAT_SUPPORTED(port->mac_stats_mask, i)) {
-   if (xstats != NULL && nstats < (int)xstats_count) {
+   if (nstats < (int)xstats_count) {
xstats[nstats].id = nstats;
xstats[nstats].value = mac_stats[i];
}
@@ -768,9 +773,16 @@ sfc_xstats_get_names(struct rte_eth_dev *dev,
unsigned int i;
unsigned int nstats = 0;
 
+   if (unlikely(xstats_names == NULL)) {
+   sfc_adapter_lock(sa);
+   nstats = port->mac_stats_nb_supported;
+   sfc_adapter_unlock(sa);
+   return nstats;
+   }
+
for (i = 0; i < EFX_MAC_NSTATS; ++i) {
if (EFX_MAC_STAT_SUPPORTED(port->mac_stats_mask, i)) {
-   if (xstats_names != NULL && nstats < xstats_count)
+   if (nstats < xstats_count)
strlcpy(xstats_names[nstats].name,
efx_mac_stat_name(sa->nic, i),
sizeof(xstats_names[0].name));
-- 
2.30.2



[dpdk-dev] [PATCH 09/11] net/sfc: prepare to add more xstats

2021-06-04 Thread Andrew Rybchenko
From: Ivan Ilchenko 

Move getting MAC stats code that involves locking to separate functions
to simplify addition of new xstats.

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/sfc.h|  4 ++
 drivers/net/sfc/sfc_ethdev.c | 73 
 drivers/net/sfc/sfc_port.c   | 80 
 3 files changed, 92 insertions(+), 65 deletions(-)

diff --git a/drivers/net/sfc/sfc.h b/drivers/net/sfc/sfc.h
index 1594f934ba..58b8c2c2ad 100644
--- a/drivers/net/sfc/sfc.h
+++ b/drivers/net/sfc/sfc.h
@@ -423,6 +423,10 @@ void sfc_port_stop(struct sfc_adapter *sa);
 void sfc_port_link_mode_to_info(efx_link_mode_t link_mode,
struct rte_eth_link *link_info);
 int sfc_port_update_mac_stats(struct sfc_adapter *sa, boolean_t manual_update);
+int sfc_port_get_mac_stats(struct sfc_adapter *sa, struct rte_eth_xstat 
*xstats,
+  unsigned int xstats_count, unsigned int *nb_written);
+int sfc_port_get_mac_stats_by_id(struct sfc_adapter *sa, const uint64_t *ids,
+uint64_t *values, unsigned int n);
 int sfc_port_reset_mac_stats(struct sfc_adapter *sa);
 int sfc_set_rx_mode(struct sfc_adapter *sa);
 int sfc_set_rx_mode_unchecked(struct sfc_adapter *sa);
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index f0567a71d0..dd7e5c253a 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -726,41 +726,17 @@ sfc_xstats_get(struct rte_eth_dev *dev, struct 
rte_eth_xstat *xstats,
 {
struct sfc_adapter *sa = sfc_adapter_by_eth_dev(dev);
struct sfc_port *port = &sa->port;
-   uint64_t *mac_stats;
-   int rc;
-   unsigned int i;
-   int nstats = 0;
-
-   sfc_adapter_lock(sa);
+   unsigned int nb_written = 0;
+   unsigned int nb_supp;
 
if (unlikely(xstats == NULL)) {
-   nstats = port->mac_stats_nb_supported;
-   goto unlock;
-   }
-
-   rc = sfc_port_update_mac_stats(sa, B_FALSE);
-   if (rc != 0) {
-   SFC_ASSERT(rc > 0);
-   nstats = -rc;
-   goto unlock;
-   }
-
-   mac_stats = port->mac_stats_buf;
-
-   for (i = 0; i < EFX_MAC_NSTATS; ++i) {
-   if (EFX_MAC_STAT_SUPPORTED(port->mac_stats_mask, i)) {
-   if (nstats < (int)xstats_count) {
-   xstats[nstats].id = nstats;
-   xstats[nstats].value = mac_stats[i];
-   }
-   nstats++;
-   }
+   sfc_adapter_lock(sa);
+   nb_supp = port->mac_stats_nb_supported;
+   sfc_adapter_unlock(sa);
+   return nb_supp;
}
 
-unlock:
-   sfc_adapter_unlock(sa);
-
-   return nstats;
+   return sfc_port_get_mac_stats(sa, xstats, xstats_count, &nb_written);
 }
 
 static int
@@ -798,44 +774,11 @@ sfc_xstats_get_by_id(struct rte_eth_dev *dev, const 
uint64_t *ids,
 uint64_t *values, unsigned int n)
 {
struct sfc_adapter *sa = sfc_adapter_by_eth_dev(dev);
-   struct sfc_port *port = &sa->port;
-   uint64_t *mac_stats;
-   unsigned int i;
-   int ret;
-   int rc;
 
if (unlikely(ids == NULL || values == NULL))
return -EINVAL;
 
-   sfc_adapter_lock(sa);
-
-   rc = sfc_port_update_mac_stats(sa, B_FALSE);
-   if (rc != 0) {
-   SFC_ASSERT(rc > 0);
-   ret = -rc;
-   goto unlock;
-   }
-
-   mac_stats = port->mac_stats_buf;
-
-   SFC_ASSERT(port->mac_stats_nb_supported <=
-  RTE_DIM(port->mac_stats_by_id));
-
-   for (i = 0; i < n; i++) {
-   if (ids[i] < port->mac_stats_nb_supported) {
-   values[i] = mac_stats[port->mac_stats_by_id[ids[i]]];
-   } else {
-   ret = i;
-   goto unlock;
-   }
-   }
-
-   ret = n;
-
-unlock:
-   sfc_adapter_unlock(sa);
-
-   return ret;
+   return sfc_port_get_mac_stats_by_id(sa, ids, values, n);
 }
 
 static int
diff --git a/drivers/net/sfc/sfc_port.c b/drivers/net/sfc/sfc_port.c
index 8c432c15f5..f6689a17c0 100644
--- a/drivers/net/sfc/sfc_port.c
+++ b/drivers/net/sfc/sfc_port.c
@@ -636,3 +636,83 @@ sfc_port_link_mode_to_info(efx_link_mode_t link_mode,
 
link_info->link_autoneg = ETH_LINK_AUTONEG;
 }
+
+int
+sfc_port_get_mac_stats(struct sfc_adapter *sa, struct rte_eth_xstat *xstats,
+  unsigned int xstats_count, unsigned int *nb_written)
+{
+   struct sfc_port *port = &sa->port;
+   uint64_t *mac_stats;
+   unsigned int i;
+   int nstats = 0;
+   int ret;
+
+   sfc_adapter_lock(sa);
+
+   ret = sfc_port_update_mac_stats(sa, B_FALSE);
+   if (ret != 0) {
+   SFC_ASSERT(re

[dpdk-dev] [PATCH 10/11] net/sfc: add xstats for Rx/Tx doorbells

2021-06-04 Thread Andrew Rybchenko
From: Ivan Ilchenko 

Rx/Tx doorbells statistics are collected in software and
available per queue. These stats are useful for performance
investigation.

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Reviewed-by: Andy Moreton 
---
 drivers/net/sfc/meson.build|   1 +
 drivers/net/sfc/sfc.c  |  16 +
 drivers/net/sfc/sfc.h  |   9 +
 drivers/net/sfc/sfc_dp.h   |  10 +
 drivers/net/sfc/sfc_ef10.h |   3 +-
 drivers/net/sfc/sfc_ef100_rx.c |   1 +
 drivers/net/sfc/sfc_ef100_tx.c |   1 +
 drivers/net/sfc/sfc_ef10_essb_rx.c |   3 +-
 drivers/net/sfc/sfc_ef10_rx.c  |   3 +-
 drivers/net/sfc/sfc_ef10_tx.c  |   1 +
 drivers/net/sfc/sfc_ethdev.c   | 124 +--
 drivers/net/sfc/sfc_port.c |  10 +-
 drivers/net/sfc/sfc_rx.c   |   1 +
 drivers/net/sfc/sfc_sw_stats.c | 572 +
 drivers/net/sfc/sfc_sw_stats.h |  49 +++
 drivers/net/sfc/sfc_tx.c   |   4 +-
 16 files changed, 772 insertions(+), 36 deletions(-)
 create mode 100644 drivers/net/sfc/sfc_sw_stats.c
 create mode 100644 drivers/net/sfc/sfc_sw_stats.h

diff --git a/drivers/net/sfc/meson.build b/drivers/net/sfc/meson.build
index 32b58e3d76..c40c8e12fe 100644
--- a/drivers/net/sfc/meson.build
+++ b/drivers/net/sfc/meson.build
@@ -56,6 +56,7 @@ sources = files(
 'sfc.c',
 'sfc_mcdi.c',
 'sfc_sriov.c',
+'sfc_sw_stats.c',
 'sfc_intr.c',
 'sfc_ev.c',
 'sfc_port.c',
diff --git a/drivers/net/sfc/sfc.c b/drivers/net/sfc/sfc.c
index 4097cf39de..274a98e228 100644
--- a/drivers/net/sfc/sfc.c
+++ b/drivers/net/sfc/sfc.c
@@ -24,6 +24,7 @@
 #include "sfc_tx.h"
 #include "sfc_kvargs.h"
 #include "sfc_tweak.h"
+#include "sfc_sw_stats.h"
 
 
 int
@@ -636,10 +637,17 @@ sfc_configure(struct sfc_adapter *sa)
if (rc != 0)
goto fail_tx_configure;
 
+   rc = sfc_sw_xstats_configure(sa);
+   if (rc != 0)
+   goto fail_sw_xstats_configure;
+
sa->state = SFC_ADAPTER_CONFIGURED;
sfc_log_init(sa, "done");
return 0;
 
+fail_sw_xstats_configure:
+   sfc_tx_close(sa);
+
 fail_tx_configure:
sfc_rx_close(sa);
 
@@ -666,6 +674,7 @@ sfc_close(struct sfc_adapter *sa)
SFC_ASSERT(sa->state == SFC_ADAPTER_CONFIGURED);
sa->state = SFC_ADAPTER_CLOSING;
 
+   sfc_sw_xstats_close(sa);
sfc_tx_close(sa);
sfc_rx_close(sa);
sfc_port_close(sa);
@@ -891,6 +900,10 @@ sfc_attach(struct sfc_adapter *sa)
 
sfc_flow_init(sa);
 
+   rc = sfc_sw_xstats_init(sa);
+   if (rc != 0)
+   goto fail_sw_xstats_init;
+
/*
 * Create vSwitch to be able to use VFs when PF is not started yet
 * as DPDK port. VFs should be able to talk to each other even
@@ -906,6 +919,9 @@ sfc_attach(struct sfc_adapter *sa)
return 0;
 
 fail_sriov_vswitch_create:
+   sfc_sw_xstats_close(sa);
+
+fail_sw_xstats_init:
sfc_flow_fini(sa);
sfc_mae_detach(sa);
 
diff --git a/drivers/net/sfc/sfc.h b/drivers/net/sfc/sfc.h
index 58b8c2c2ad..331e06bac6 100644
--- a/drivers/net/sfc/sfc.h
+++ b/drivers/net/sfc/sfc.h
@@ -217,6 +217,14 @@ struct sfc_counter_rxq {
struct rte_mempool  *mp;
 };
 
+struct sfc_sw_xstats {
+   uint64_t*reset_vals;
+
+   rte_spinlock_t  queues_bitmap_lock;
+   void*queues_bitmap_mem;
+   struct rte_bitmap   *queues_bitmap;
+};
+
 /* Adapter private data */
 struct sfc_adapter {
/*
@@ -249,6 +257,7 @@ struct sfc_adapter {
struct sfc_sriovsriov;
struct sfc_intr intr;
struct sfc_port port;
+   struct sfc_sw_xstatssw_xstats;
struct sfc_filter   filter;
struct sfc_mae  mae;
 
diff --git a/drivers/net/sfc/sfc_dp.h b/drivers/net/sfc/sfc_dp.h
index 61c1a3fbac..7fd8f34b0f 100644
--- a/drivers/net/sfc/sfc_dp.h
+++ b/drivers/net/sfc/sfc_dp.h
@@ -42,6 +42,16 @@ enum sfc_dp_type {
 
 /** Datapath queue run-time information */
 struct sfc_dp_queue {
+   /*
+* Typically the structure is located at the end of Rx/Tx queue
+* data structure and not used on datapath. So, it is not a
+* problem to have extra fields even if not used. However,
+* put stats at top of the structure to be closer to fields
+* used on datapath or reap to have more chances to be cache-hot.
+*/
+   uint32_trx_dbells;
+   uint32_ttx_dbells;
+
uint16_tport_id;
uint16_tqueue_id;
struct rte_pci_addr pci_addr;
diff --git a/drivers/net/sfc/sfc_ef10.h b/drivers/net/sfc/sfc_ef10.h
index ad4c1fdbef..e9bb72e28b 100644
--- a/drivers/net/sfc/sfc_ef10.h
+++ b/dri

[dpdk-dev] [PATCH 11/11] app/testpmd: add option to display extended statistics

2021-06-04 Thread Andrew Rybchenko
From: Ivan Ilchenko 

Add 'display-xstats' option for using in accompanying with Rx/Tx statistics
(i.e. 'stats-period' option or 'show port stats' interactive command) to
display specified list of extended statistics.

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
---
 app/test-pmd/cmdline.c|  56 
 app/test-pmd/config.c |  66 ++
 app/test-pmd/parameters.c |  18 
 app/test-pmd/testpmd.c| 122 ++
 app/test-pmd/testpmd.h|  21 +
 doc/guides/testpmd_app_ug/run_app.rst |   5 ++
 6 files changed, 288 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0268b18f95..b1fd136982 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3613,6 +3613,62 @@ cmdline_parse_inst_t cmd_stop = {
 
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
+int
+parse_xstats_list(char *in_str, struct rte_eth_xstat_name **xstats,
+ unsigned int *xstats_num)
+{
+   int max_names_nb, names_nb;
+   int stringlen;
+   char **names;
+   char *str;
+   int ret;
+   int i;
+
+   names = NULL;
+   str = strdup(in_str);
+   if (str == NULL) {
+   ret = ENOMEM;
+   goto out;
+   }
+   stringlen = strlen(str);
+
+   for (i = 0, max_names_nb = 1; str[i] != '\0'; i++) {
+   if (str[i] == ',')
+   max_names_nb++;
+   }
+
+   names = calloc(max_names_nb, sizeof(*names));
+   if (names == NULL) {
+   ret = ENOMEM;
+   goto out;
+   }
+
+   names_nb = rte_strsplit(str, stringlen, names, max_names_nb, ',');
+   printf("max names is %d\n", max_names_nb);
+   if (names_nb < 0) {
+   ret = EINVAL;
+   goto out;
+   }
+
+   *xstats = calloc(names_nb, sizeof(**xstats));
+   if (*xstats == NULL) {
+   ret = ENOMEM;
+   goto out;
+   }
+
+   for (i = 0; i < names_nb; i++)
+   rte_strscpy((*xstats)[i].name, names[i],
+   sizeof((*xstats)[i].name));
+
+   *xstats_num = names_nb;
+   ret = 0;
+
+out:
+   free(names);
+   free(str);
+   return ret;
+}
+
 unsigned int
 parse_item_list(char* str, const char* item_name, unsigned int max_items,
unsigned int *parsed_items, int check_unique_values)
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 43c79b5021..8e71b664cd 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -173,6 +173,70 @@ print_ethaddr(const char *name, struct rte_ether_addr 
*eth_addr)
printf("%s%s", name, buf);
 }
 
+static void
+nic_xstats_display_periodic(portid_t port_id)
+{
+   struct xstat_display_info *xstats_info;
+   uint64_t *prev_values, *curr_values;
+   uint64_t diff_value, value_rate;
+   uint64_t *ids, *ids_supp;
+   struct timespec cur_time;
+   unsigned int i, i_supp;
+   size_t ids_supp_sz;
+   uint64_t diff_ns;
+   int rc;
+
+   xstats_info = &xstats_per_port[port_id];
+
+   ids_supp_sz = xstats_info->ids_supp_sz;
+   if (xstats_display_num == 0 || ids_supp_sz == 0)
+   return;
+
+   printf("\n");
+
+   ids = xstats_info->ids;
+   ids_supp = xstats_info->ids_supp;
+   prev_values = xstats_info->prev_values;
+   curr_values = xstats_info->curr_values;
+
+   rc = rte_eth_xstats_get_by_id(port_id, ids_supp, curr_values,
+ ids_supp_sz);
+   if (rc != (int)ids_supp_sz) {
+   fprintf(stderr, "%s: Failed to get values of %zu supported 
xstats for port %u - return code %d\n",
+   __func__, ids_supp_sz, port_id, rc);
+   return;
+   }
+
+   diff_ns = 0;
+   if (clock_gettime(CLOCK_TYPE_ID, &cur_time) == 0) {
+   uint64_t ns;
+
+   ns = cur_time.tv_sec * NS_PER_SEC;
+   ns += cur_time.tv_nsec;
+
+   if (xstats_info->prev_ns != 0)
+   diff_ns = ns - xstats_info->prev_ns;
+   xstats_info->prev_ns = ns;
+   }
+
+   printf("%-31s%-17s%s\n", " ", "Value", "Rate (since last show)");
+   for (i = i_supp = 0; i < xstats_display_num; i++) {
+   if (ids[i] == XSTAT_ID_INVALID)
+   continue;
+
+   diff_value = (curr_values[i_supp] > prev_values[i]) ?
+(curr_values[i_supp] - prev_values[i]) : 0;
+   prev_values[i] = curr_values[i_supp];
+   value_rate = diff_ns > 0 ?
+   (double)diff_value / diff_ns * NS_PER_SEC : 0;
+
+   printf("  %-25s%12"PRIu64" %15"PRIu64"\n",
+  xstats_display[i].name, curr_values[i_supp], value_rate);
+
+   i_supp++;
+   }
+}
+
 void
 nic_stats_display(portid_t p

Re: [dpdk-dev] [PATCH] app/procinfo: add device registers dump

2021-06-04 Thread Pattan, Reshma



> -Original Message-
> From: Min Hu (Connor) 

 

> + ret = rte_eth_dev_get_reg_info(i, ®_info);
> + if (ret) {
> + printf("Error getting device reg info: %d\n", ret);
> + continue;
> + }
> +
> + buf_size = reg_info.length * reg_info.width;


If it is to get the regs length, you can directly call  
"rte_ethtool_get_regs_len(uint16_t port_id)" API , instead of  again writing 
the above logic.
And use the returned length in below malloc.


> + fp_regs = fopen(file_name, "wb");
> + if (fp_regs == NULL) {
> + printf("Error during opening '%s' for writing\n",
> + file_name);

Better to print error string from fopen() errno on failure , to indicate the 
exact error.

> + } else {
> + if ((int)fwrite(buf_data, 1, buf_size, fp_regs) !=

Better have "((int)fwrite(buf_data, 1, buf_size, fp_regs)" In separate line and 
use the returned value inside if check.

> + buf_size)
> + printf("Error during writing %s\n",
> + file_prefix);

Better to print error string from fwrite errno on failure , to indicate the 
exact error.

> + else
> + printf("dump device (%s) regs successfully, "

Reframe the sente to "Device regs dumped successfully"


Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Jerin Jacob
On Fri, Jun 4, 2021 at 6:25 PM Thomas Monjalon  wrote:
>
> 03/06/2021 13:38, Jerin Jacob:
> > On Thu, Jun 3, 2021 at 4:00 PM Thomas Monjalon  wrote:
> > > 03/06/2021 12:04, Jerin Jacob:
> > > > On Thu, Jun 3, 2021 at 3:06 PM Thomas Monjalon  
> > > > wrote:
> > > > > 03/06/2021 11:20, Jerin Jacob:
> > > > > > The device needs have a queue kind of structure
> > > > > > and it is mapping to core to have a notion of configure. 
> > > > > > queue_setup,
> > > > > > start and stop etc
> > > > >
> > > > > Why is it a requirement to call it a device API?
> > > >
> > > > Then we need to define what needs to call as device library vs library 
> > > > and how?
> > > > Why mempool is not called a  device library vs library?
> > >
> > > My view is simple:
> > > if it has drivers, it is a device API, except bus and mempool libs.
> >
> > rte_secuity has drivers but it is not called a device library.
>
> rte_security is a monster beast :)
> Yes it has rte_security_ops implemented in net and crypto drivers,
> but it is an API extension only, there is no driver dedicated to security.
>
> > > About mempool, it started as a standard lib and got extended for HW 
> > > support.
> >
> > Yes. We did not change to device library as it was fundamentally
> > different than another DPDK deices
> > when we added the device support.
> >
> > > > and why all
> > > > other device library has a common structure like queues and
> > > > it binding core etc. I tried to explain above the similar attributes
> > > > for dpdk device libraries[1] which I think, it a requirement so
> > > > that the end user will have familiarity with device libraries rather
> > > > than each one has separate General guidelines and principles.
> > > >
> > > > I think, it is more TB discussion topic and decides on this because I
> > > > don't see in technical issue in calling it a library.
> > >
> > > The naming is just a choice.
> >
> > Not sure.
> >
> > > Yesterday morning it was called lib/gpu/
> > > and in the evening it was renamed lib/gpudev/
> > > so no technical issue :)
> > >
> > > But the design of the API with queues or other paradigm
> > > is something I would like to discuss here.
> >
> > Yeah, That is important. IMO, That defines what needs to be a device 
> > library.
> >
> > > Note: there was no intent to publish GPU processing control
> > > in DPDK 21.08. We want to focus on GPU memory in 21.08,
> > > but I understand it is a key decision in the big picture.
> >
> > if the scope is only memory allocation, IMO, it is better to make a library.
>
> No it is only the first step.
>
> > > What would be your need and would you design such API?
> >
> > For me, there is no need for gpu library(as of now). May GPU consumers
> > can define what they need to control using the library.
>
> We need to integrate GPU processing workload in the DPDK workflow
> as a generic API.
> There could be 2 modes:
> - queue of tasks
> - tasks in an infinite loop
> In both modes, we could get completion notifications
> with an interrupt/callback or by polling a shared memory.


OK. If we have enqeue/dequeue kind operation and with queue model then it
makes sense to have a device model. It was not there in your initial
patch, but if we are adding
in the future then it OK.



>
>
>


Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Jerin Jacob
On Fri, Jun 4, 2021 at 7:39 PM Thomas Monjalon  wrote:
>
> 04/06/2021 15:59, Andrew Rybchenko:
> > On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > > 04/06/2021 15:05, Andrew Rybchenko:
> > >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > >>> 04/06/2021 13:09, Jerin Jacob:
> >  On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon  
> >  wrote:
> > > 03/06/2021 11:33, Ferruh Yigit:
> > >> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > >>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon 
> > >>>  wrote:
> >  +  [gpudev] (@ref rte_gpudev.h),
> > >>>
> > >>> Since this device does not have a queue etc? Shouldn't make it a
> > >>> library like mempool with vendor-defined ops?
> > >>
> > >> +1
> > >>
> > >> Current RFC announces additional memory allocation capabilities, 
> > >> which can suits
> > >> better as extension to existing memory related library instead of a 
> > >> new device
> > >> abstraction library.
> > >
> > > It is not replacing mempool.
> > > It is more at the same level as EAL memory management:
> > > allocate simple buffer, but with the exception it is done
> > > on a specific device, so it requires a device ID.
> > >
> > > The other reason it needs to be a full library is that
> > > it will start a workload on the GPU and get completion notification
> > > so we can integrate the GPU workload in a packet processing pipeline.
> > 
> >  I might have confused you. My intention is not to make to fit under 
> >  mempool API.
> > 
> >  I agree that we need a separate library for this. My objection is only
> >  to not call libgpudev and
> >  call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
> >  it not like existing "device libraries" in DPDK and
> >  it like other "libraries" in DPDK.
> > >>>
> > >>> I think we should define a queue of processing actions,
> > >>> so it looks like other device libraries.
> > >>> And anyway I think a library managing a device class,
> > >>> and having some device drivers deserves the name of device library.
> > >>>
> > >>> I would like to read more opinions.
> > >>
> > >> Since the library is an unified interface to GPU device drivers
> > >> I think it should be named as in the patch - gpudev.
> > >>
> > >> Mempool looks like an exception here - initially it was pure SW
> > >> library, but not there are HW backends and corresponding device
> > >> drivers.
> > >>
> > >> What I don't understand where is GPU specifics here?
> > >
> > > That's an interesting question.
> > > Let's ask first what is a GPU for DPDK?
> > > I think it is like a sub-CPU with high parallel execution capabilities,
> > > and it is controlled by the CPU.
> >
> > I have no good ideas how to name it in accordance with
> > above description to avoid "G" which for "Graphics" if
> > understand correctly. However, may be it is not required.
> > No strong opinion on the topic, but unbinding from
> > "Graphics" would be nice.
>
> That's a question I ask myself for months now.
> I am not able to find a better name,
> and I start thinking that "GPU" is famous enough in high-load computing
> to convey the idea of what we can expect.


The closest I can think of is big-little architecture in ARM SoC.
https://www.arm.com/why-arm/technologies/big-little

We do have similar architecture, Where the "coprocessor" is part of
the main CPU.
It is operations are:
- Download firmware
- Memory mapping for Main CPU memory by the co-processor
- Enq/Deq Jobs from/to Main CPU/Coprocessor CPU.

If your scope is something similar and No Graphics involved here then
we can remove G.

Coincidentally, Yesterday, I had an interaction with Elena for the
same for BaseBand related work in ORAN where
GPU used as Baseband processing instead of Graphics.(So I can
understand the big picture of this library)

I can think of "coprocessor-dev" as one of the name. We do have
similar machine learning co-processors(for compute)
if we can keep a generic name and it is for the above functions we may
use this subsystem as well in the future.










>
>
>


Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Thomas Monjalon
04/06/2021 17:20, Jerin Jacob:
> On Fri, Jun 4, 2021 at 7:39 PM Thomas Monjalon  wrote:
> > 04/06/2021 15:59, Andrew Rybchenko:
> > > On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > > > 04/06/2021 15:05, Andrew Rybchenko:
> > > >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > > >>> 04/06/2021 13:09, Jerin Jacob:
> > >  On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon  
> > >  wrote:
> > > > 03/06/2021 11:33, Ferruh Yigit:
> > > >> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > >>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon 
> > > >>>  wrote:
> > >  +  [gpudev] (@ref rte_gpudev.h),
> > > >>>
> > > >>> Since this device does not have a queue etc? Shouldn't make it a
> > > >>> library like mempool with vendor-defined ops?
> > > >>
> > > >> +1
> > > >>
> > > >> Current RFC announces additional memory allocation capabilities, 
> > > >> which can suits
> > > >> better as extension to existing memory related library instead of 
> > > >> a new device
> > > >> abstraction library.
> > > >
> > > > It is not replacing mempool.
> > > > It is more at the same level as EAL memory management:
> > > > allocate simple buffer, but with the exception it is done
> > > > on a specific device, so it requires a device ID.
> > > >
> > > > The other reason it needs to be a full library is that
> > > > it will start a workload on the GPU and get completion notification
> > > > so we can integrate the GPU workload in a packet processing 
> > > > pipeline.
> > > 
> > >  I might have confused you. My intention is not to make to fit under 
> > >  mempool API.
> > > 
> > >  I agree that we need a separate library for this. My objection is 
> > >  only
> > >  to not call libgpudev and
> > >  call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
> > >  it not like existing "device libraries" in DPDK and
> > >  it like other "libraries" in DPDK.
> > > >>>
> > > >>> I think we should define a queue of processing actions,
> > > >>> so it looks like other device libraries.
> > > >>> And anyway I think a library managing a device class,
> > > >>> and having some device drivers deserves the name of device library.
> > > >>>
> > > >>> I would like to read more opinions.
> > > >>
> > > >> Since the library is an unified interface to GPU device drivers
> > > >> I think it should be named as in the patch - gpudev.
> > > >>
> > > >> Mempool looks like an exception here - initially it was pure SW
> > > >> library, but not there are HW backends and corresponding device
> > > >> drivers.
> > > >>
> > > >> What I don't understand where is GPU specifics here?
> > > >
> > > > That's an interesting question.
> > > > Let's ask first what is a GPU for DPDK?
> > > > I think it is like a sub-CPU with high parallel execution capabilities,
> > > > and it is controlled by the CPU.
> > >
> > > I have no good ideas how to name it in accordance with
> > > above description to avoid "G" which for "Graphics" if
> > > understand correctly. However, may be it is not required.
> > > No strong opinion on the topic, but unbinding from
> > > "Graphics" would be nice.
> >
> > That's a question I ask myself for months now.
> > I am not able to find a better name,
> > and I start thinking that "GPU" is famous enough in high-load computing
> > to convey the idea of what we can expect.
> 
> 
> The closest I can think of is big-little architecture in ARM SoC.
> https://www.arm.com/why-arm/technologies/big-little
> 
> We do have similar architecture, Where the "coprocessor" is part of
> the main CPU.
> It is operations are:
> - Download firmware
> - Memory mapping for Main CPU memory by the co-processor
> - Enq/Deq Jobs from/to Main CPU/Coprocessor CPU.

Yes it looks like the exact same scope.
I like the word "co-processor" in this context.

> If your scope is something similar and No Graphics involved here then
> we can remove G.

Indeed no graphics in DPDK :)
By removing the G, you mean keeping only PU? like "pudev"?
We could also define the G as "General".

> Coincidentally, Yesterday, I had an interaction with Elena for the
> same for BaseBand related work in ORAN where
> GPU used as Baseband processing instead of Graphics.(So I can
> understand the big picture of this library)

Yes baseband processing is one possible usage of GPU with DPDK.
We could also imagine some security analysis, or any machine learning...

> I can think of "coprocessor-dev" as one of the name.

"coprocessor" looks too long as prefix of the functions.

> We do have similar machine learning co-processors(for compute)
> if we can keep a generic name and it is for the above functions we may
> use this subsystem as well in the future.

Yes that's the idea to share a common synchronization mechanism
with different HW.

That's cool to have such a big interest in the community for this patch.




Re: [dpdk-dev] [PATCH v1] net/i40e: remove the SMP barrier in HW scanning func

2021-06-04 Thread Honnappa Nagarahalli

> 
> Add the logic to determine how many DD bits have been set for contiguous
> packets, for removing the SMP barrier while reading descs.
Are there any performance numbers with this change?

> 
> Signed-off-by: Joyce Kong 
> Reviewed-by: Ruifeng Wang 
> ---
>  drivers/net/i40e/i40e_rxtx.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index
> 6c58decec..410a81f30 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -452,7 +452,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
>   uint16_t pkt_len;
>   uint64_t qword1;
>   uint32_t rx_status;
> - int32_t s[I40E_LOOK_AHEAD], nb_dd;
> + int32_t s[I40E_LOOK_AHEAD], var, nb_dd;
>   int32_t i, j, nb_rx = 0;
>   uint64_t pkt_flags;
>   uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl; @@ -482,11
> +482,14 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
>   I40E_RXD_QW1_STATUS_SHIFT;
>   }
> 
> - rte_smp_rmb();
> -
>   /* Compute how many status bits were set */
> - for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)
> - nb_dd += s[j] & (1 <<
> I40E_RX_DESC_STATUS_DD_SHIFT);
> + for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) {
> + var = s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT);
> + if (var)
> + nb_dd += 1;
> + else
> + break;
> + }
> 
>   nb_rx += nb_dd;
> 
> --
> 2.17.1



Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Wang, Haiyue
> -Original Message-
> From: Thomas Monjalon 
> Sent: Friday, June 4, 2021 22:06
> To: Wang, Haiyue 
> Cc: dev@dpdk.org; Elena Agostini ; 
> andrew.rybche...@oktetlabs.ru; Yigit, Ferruh
> ; jer...@marvell.com
> Subject: Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
> 
> 04/06/2021 15:25, Wang, Haiyue:
> > From: Thomas Monjalon 
> > > Another question is about the function rte_gpu_free().
> > > How do we recognize that a memory chunk is from the CPU and GPU visible,
> > > or just from GPU?
> > >
> >
> > I didn't find the rte_gpu_free_visible definition, and the rte_gpu_free's
> > comment just says: deallocate a chunk of memory allocated with 
> > rte_gpu_malloc*
> >
> > Looks like the rte_gpu_free can handle this case ?
> 
> This is the proposal, yes.
> 
> > And from the definition "rte_gpu_free(uint16_t gpu_id, void *ptr)", the
> > free needs to check whether this memory belong to the GPU or not, so it
> > also can recognize the memory type, I think.
> 
> Yes that's the idea behind having a single free function.
> We could have some metadata in front of the memory chunk.
> My question is to confirm whether it is a good design or not,
> and whether it should be driver specific or have a common struct in the lib.
> 
> Opinions?
> 

Make the GPU memory to be registered into the common lib API with the metadata
like address, size etc, and also some GPU specific callbacks like to handle how
to make GPU memory visible to CPU ?

And the memory register can be like the exist external memory function:

int
rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
unsigned int n_pages, size_t page_sz)



Re: [dpdk-dev] [PATCH] gpudev: introduce memory API

2021-06-04 Thread Wang, Haiyue
> -Original Message-
> From: dev  On Behalf Of Thomas Monjalon
> Sent: Friday, June 4, 2021 23:51
> To: Jerin Jacob 
> Cc: Honnappa Nagarahalli ; Andrew Rybchenko
> ; Yigit, Ferruh ; 
> dpdk-dev ;
> Elena Agostini ; David Marchand 
> 
> Subject: Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
> 
> 04/06/2021 17:20, Jerin Jacob:
> > On Fri, Jun 4, 2021 at 7:39 PM Thomas Monjalon  wrote:
> > > 04/06/2021 15:59, Andrew Rybchenko:
> > > > On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > > > > 04/06/2021 15:05, Andrew Rybchenko:
> > > > >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > > > >>> 04/06/2021 13:09, Jerin Jacob:
> > > >  On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon 
> > > >   wrote:
> > > > > 03/06/2021 11:33, Ferruh Yigit:
> > > > >> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > > >>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon 
> > > > >>>  wrote:
> > > >  +  [gpudev] (@ref rte_gpudev.h),
> > > > >>>
> > > > >>> Since this device does not have a queue etc? Shouldn't make it a
> > > > >>> library like mempool with vendor-defined ops?
> > > > >>
> > > > >> +1
> > > > >>
> > > > >> Current RFC announces additional memory allocation capabilities, 
> > > > >> which can suits
> > > > >> better as extension to existing memory related library instead 
> > > > >> of a new device
> > > > >> abstraction library.
> > > > >
> > > > > It is not replacing mempool.
> > > > > It is more at the same level as EAL memory management:
> > > > > allocate simple buffer, but with the exception it is done
> > > > > on a specific device, so it requires a device ID.
> > > > >
> > > > > The other reason it needs to be a full library is that
> > > > > it will start a workload on the GPU and get completion 
> > > > > notification
> > > > > so we can integrate the GPU workload in a packet processing 
> > > > > pipeline.
> > > > 
> > > >  I might have confused you. My intention is not to make to fit 
> > > >  under mempool API.
> > > > 
> > > >  I agree that we need a separate library for this. My objection is 
> > > >  only
> > > >  to not call libgpudev and
> > > >  call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev 
> > > >  as
> > > >  it not like existing "device libraries" in DPDK and
> > > >  it like other "libraries" in DPDK.
> > > > >>>
> > > > >>> I think we should define a queue of processing actions,
> > > > >>> so it looks like other device libraries.
> > > > >>> And anyway I think a library managing a device class,
> > > > >>> and having some device drivers deserves the name of device library.
> > > > >>>
> > > > >>> I would like to read more opinions.
> > > > >>
> > > > >> Since the library is an unified interface to GPU device drivers
> > > > >> I think it should be named as in the patch - gpudev.
> > > > >>
> > > > >> Mempool looks like an exception here - initially it was pure SW
> > > > >> library, but not there are HW backends and corresponding device
> > > > >> drivers.
> > > > >>
> > > > >> What I don't understand where is GPU specifics here?
> > > > >
> > > > > That's an interesting question.
> > > > > Let's ask first what is a GPU for DPDK?
> > > > > I think it is like a sub-CPU with high parallel execution 
> > > > > capabilities,
> > > > > and it is controlled by the CPU.
> > > >
> > > > I have no good ideas how to name it in accordance with
> > > > above description to avoid "G" which for "Graphics" if
> > > > understand correctly. However, may be it is not required.
> > > > No strong opinion on the topic, but unbinding from
> > > > "Graphics" would be nice.
> > >
> > > That's a question I ask myself for months now.
> > > I am not able to find a better name,
> > > and I start thinking that "GPU" is famous enough in high-load computing
> > > to convey the idea of what we can expect.
> >
> >
> > The closest I can think of is big-little architecture in ARM SoC.
> > https://www.arm.com/why-arm/technologies/big-little
> >
> > We do have similar architecture, Where the "coprocessor" is part of
> > the main CPU.
> > It is operations are:
> > - Download firmware
> > - Memory mapping for Main CPU memory by the co-processor
> > - Enq/Deq Jobs from/to Main CPU/Coprocessor CPU.
> 
> Yes it looks like the exact same scope.
> I like the word "co-processor" in this context.
> 
> > If your scope is something similar and No Graphics involved here then
> > we can remove G.
> 
> Indeed no graphics in DPDK :)
> By removing the G, you mean keeping only PU? like "pudev"?
> We could also define the G as "General".
> 
> > Coincidentally, Yesterday, I had an interaction with Elena for the
> > same for BaseBand related work in ORAN where
> > GPU used as Baseband processing instead of Graphics.(So I can
> > understand the big picture of this library)
> 
> Yes baseband processing is one possible usage of GPU with DPDK.
> We could

Re: [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test

2021-06-04 Thread Stephen Hemminger
On Fri,  4 Jun 2021 04:46:16 -0500
Joyce Kong  wrote:

> Since C11 memory model is adopted in DPDK now[1], use GCC's 
> atomic builtins in test cases.
> 
> [1]https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
> 
> Joyce Kong (8):
>   test/ticketlock: use GCC atomic builtins for lcores sync
>   test/spinlock: use GCC atomic builtins for lcores sync
>   test/rwlock: use GCC atomic builtins for lcores sync
>   test/mcslock: use GCC atomic builtins for lcores sync
>   test/mempool: remove unused variable for lcores sync
>   test/mempool_perf: use GCC atomic builtins for lcores sync
>   test/service_cores: use GCC atomic builtins for lock sync
>   test/rcu_perf: use GCC atomic builtins for data sync
> 
>  app/test/test_mcslock.c   | 13 +++--
>  app/test/test_mempool.c   |  5 --
>  app/test/test_mempool_perf.c  | 12 ++---
>  app/test/test_rcu_qsbr_perf.c | 98 +--
>  app/test/test_rwlock.c|  9 ++--
>  app/test/test_service_cores.c | 36 +++--
>  app/test/test_spinlock.c  | 10 ++--
>  app/test/test_ticketlock.c|  9 ++--
>  8 files changed, 93 insertions(+), 99 deletions(-)
> 

Thanks, I did this for pflock tests during review cycle

Acked-by: Stephen Hemminger 


Re: [dpdk-dev] [PATCH v2 2/2] eal: handle compressed firmwares

2021-06-04 Thread Dmitry Kozlyuk
2021-06-04 09:27 (UTC+0200), David Marchand:
> On Fri, Jun 4, 2021 at 12:29 AM Dmitry Kozlyuk  
> wrote:
> >
> > 2021-06-03 18:55 (UTC+0200), David Marchand:
> > [...]  
> > > diff --git a/config/meson.build b/config/meson.build
> > > index 017bb2efbb..c6985139b4 100644
> > > --- a/config/meson.build
> > > +++ b/config/meson.build
> > > @@ -172,6 +172,13 @@ if libexecinfo.found() and 
> > > cc.has_header('execinfo.h')
> > >  dpdk_extra_ldflags += '-lexecinfo'
> > >  endif
> > >
> > > +libarchive = dependency('libarchive', required: false, method: 
> > > 'pkg-config')
> > > +if libarchive.found()
> > > +dpdk_conf.set('RTE_HAS_LIBARCHIVE', 1)
> > > +add_project_link_arguments('-larchive', language: 'c')
> > > +dpdk_extra_ldflags += '-larchive'
> > > +endif
> > > +  
> >
> > Suggestion:
> >
> > diff --git a/config/meson.build b/config/meson.build
> > index c6985139b4..c3668798c1 100644
> > --- a/config/meson.build
> > +++ b/config/meson.build
> > @@ -175,7 +175,6 @@ endif
> >  libarchive = dependency('libarchive', required: false, method: 
> > 'pkg-config')
> >  if libarchive.found()
> >  dpdk_conf.set('RTE_HAS_LIBARCHIVE', 1)
> > -add_project_link_arguments('-larchive', language: 'c')
> >  dpdk_extra_ldflags += '-larchive'
> >  endif
> >
> > diff --git a/lib/eal/meson.build b/lib/eal/meson.build
> > index 1722924f67..5a018d97d6 100644
> > --- a/lib/eal/meson.build
> > +++ b/lib/eal/meson.build
> > @@ -16,6 +16,7 @@ subdir(exec_env)
> >  subdir(arch_subdir)
> >
> >  deps += ['kvargs']
> > +ext_deps += libarchive
> >  if not is_windows
> >  deps += ['telemetry']
> >  endif
> >  
> 
> I had tried something close when preparing v2 (only keeping
> RTE_HAS_LIBARCHIVE in config/meson.build and putting extra_ldflags and
> ext_deps in lib/eal/unix/meson.build) but both my try and your
> suggestion break static compilation for the helloworld example.
> 
> 
> $ ./devtools/test-meson-builds.sh -vv
> ...
> ## Building helloworld
[snip]

Thanks for details.
Indeed, libarchive.pc lists all libraries present at libarchive build time
in Libs.private, despite that libarchive static linkage doesn't require them.
We'll have to go your way, sorry for misdirection.
Maybe it's worth a comment.

From libarchive README:

I've attempted to minimize static link pollution. If you don't
explicitly invoke a particular feature (such as support for a
particular compression or format), it won't get pulled in to
statically-linked programs. In particular, if you don't explicitly
enable a particular compression or decompression support, you won't
need to link against the corresponding compression or decompression
libraries. This also reduces the size of statically-linked binaries
in environments where that matters.



[dpdk-dev] [PATCH v8 00/10] eal: Add EAL API for threading

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

EAL thread API

**Problem Statement**
DPDK currently uses the pthread interface to create and manage threads.
Windows does not support the POSIX thread programming model, so it currently
relies on a header file that hides the Windows calls under
pthread matched interfaces. Given that EAL should isolate the environment
specifics from the applications and libraries and mediate
all the communication with the operating systems, a new EAL interface
is needed for thread management.

**Goals**
* Introduce a generic EAL API for threading support that will remove
  the current Windows pthread.h shim.
* Replace references to pthread_* across the DPDK codebase with the new
  RTE_THREAD_* API.
* Allow users to choose between using the RTE_THREAD_* API or a
  3rd party thread library through a configuration option.

**Design plan**
New API main files:
* rte_thread.h (librte_eal/include)
* rte_thread_types.h (librte_eal/include)
* rte_thread_windows_types.h (librte_eal/windows/include)
* rte_thread.c (librte_eal/windows)
* rte_thread.c (librte_eal/common)

For flexibility, the user is offered the option of either using the 
RTE_THREAD_* API or
a 3rd party thread library, through a meson flag “use_external_thread_lib”.
By default, this flag is set to FALSE, which means Windows libraries and 
applications
will use the RTE_THREAD_* API for managing threads.

If compiling on Windows and the “use_external_thread_lib” is *not* set,
the following files will be parsed: 
* include/rte_thread.h
* windows/include/rte_thread_windows_types.h
* windows/rte_thread.c
In all other cases, the compilation/parsing includes the following files:
* include/rte_thread.h 
* include/rte_thread_types.h
* common/rte_thread.c

**A schematic example of the design**
--
lib/librte_eal/include/rte_thread.h
int rte_thread_create();

lib/librte_eal/common/rte_thread.c
int rte_thread_create() 
{
return pthread_create();
}

lib/librte_eal/windows/rte_thread.c
int rte_thread_create() 
{
return CreateThread();
}

lib/librte_eal/windows/meson.build
if get_option('use_external_thread_lib')
sources += 'librte_eal/common/rte_thread.c'
else
sources += 'librte_eal/windows/rte_thread.c'
endif
-

**Thread attributes**

When or after a thread is created, specific characteristics of the thread
can be adjusted. Given that the thread characteristics that are of interest
for DPDK applications are affinity and priority, the following structure
that represents thread attributes has been defined:

typedef struct
{
enum rte_thread_priority priority;
rte_cpuset_t cpuset;
} rte_thread_attr_t;

The *rte_thread_create()* function can optionally receive an rte_thread_attr_t
object that will cause the thread to be created with the affinity and priority
described by the attributes object. If no rte_thread_attr_t is passed
(parameter is NULL), the default affinity and priority are used.
An rte_thread_attr_t object can also be set to the default values
by calling *rte_thread_attr_init()*.

*Priority* is represented through an enum that currently advertises
two values for priority:
- RTE_THREAD_PRIORITY_NORMAL
- RTE_THREAD_PRIORITY_REALTIME_CRITICAL
The enum can be extended to allow for multiple priority levels.
rte_thread_set_priority  - sets the priority of a thread
rte_thread_attr_set_priority - updates an rte_thread_attr_t object
   with a new value for priority

The user can choose thread priority through an EAL parameter,
when starting an application.  If EAL parameter is not used,
the per-platform default value for thread priority is used.
Otherwise administrator has an option to set one of available options:
 --thread-prio normal
 --thread-prio realtime

Example:
./dpdk-l2fwd -l 0-3 -n 4 –thread-prio normal -- -q 8 -p 

*Affinity* is described by the already known “rte_cpuset_t” type.
rte_thread_attr_set/get_affinity - sets/gets the affinity field in a
   rte_thread_attr_t object
rte_thread_set/get_affinity  – sets/gets the affinity of a thread

**Errors**
A translation function that maps Windows error codes to errno-style
error codes is provided. 

**Future work**
Note that this patchset was focused on introducing new API that will
remove the Windows pthread.h shim. In DPDK, there are still a few references
to pthread_* that were not implemented in the shim.
The long term plan is for EAL to provide full threading support:
* Adding support for conditional variables
* Additional functionality offered by pthread_* (such as pthread_setname_np, 
etc.)
* Static mutex initializers are not used on Windows. If we must continue
  using them, they need to be platform dependent and an implementation will
  need to be provided for Windows.

v8:
- Rebase
- Add rte_thread_detach() API
- Set default priority, when user did not specify a 

[dpdk-dev] [PATCH v8 01/10] eal: add thread id and simple thread functions

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Use a portable, type-safe representation for the thread identifier.
Add functions for comparing thread ids and obtaining the thread id
for the current thread.
---
 lib/eal/common/rte_thread.c   | 105 ++
 lib/eal/include/rte_thread.h  |  53 +++--
 lib/eal/include/rte_thread_types.h|  10 ++
 .../include/rte_windows_thread_types.h|  10 ++
 lib/eal/windows/rte_thread.c  |  17 +++
 5 files changed, 186 insertions(+), 9 deletions(-)
 create mode 100644 lib/eal/common/rte_thread.c
 create mode 100644 lib/eal/include/rte_thread_types.h
 create mode 100644 lib/eal/windows/include/rte_windows_thread_types.h

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
new file mode 100644
index 00..1292f7a8f8
--- /dev/null
+++ b/lib/eal/common/rte_thread.c
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 Mellanox Technologies, Ltd
+ * Copyright(c) 2021 Microsoft Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+struct eal_tls_key {
+   pthread_key_t thread_index;
+};
+
+rte_thread_t
+rte_thread_self(void)
+{
+   rte_thread_t thread_id = { 0 };
+
+   thread_id.opaque_id = pthread_self();
+
+   return thread_id;
+}
+
+int
+rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
+{
+   return pthread_equal(t1.opaque_id, t2.opaque_id);
+}
+
+int
+rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *))
+{
+   int err;
+   rte_thread_key k;
+
+   k = malloc(sizeof(*k));
+   if (k == NULL) {
+   RTE_LOG(DEBUG, EAL, "Cannot allocate TLS key.\n");
+   return EINVAL;
+   }
+   err = pthread_key_create(&(k->thread_index), destructor);
+   if (err != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_key_create failed: %s\n",
+strerror(err));
+   free(k);
+   return err;
+   }
+   *key = k;
+   return 0;
+}
+
+int
+rte_thread_key_delete(rte_thread_key key)
+{
+   int err;
+
+   if (key == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
+   return EINVAL;
+   }
+   err = pthread_key_delete(key->thread_index);
+   if (err != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_key_delete failed: %s\n",
+strerror(err));
+   free(key);
+   return err;
+   }
+   free(key);
+   return 0;
+}
+
+int
+rte_thread_value_set(rte_thread_key key, const void *value)
+{
+   int err;
+
+   if (key == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
+   return EINVAL;
+   }
+   err = pthread_setspecific(key->thread_index, value);
+   if (err != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_setspecific failed: %s\n",
+   strerror(err));
+   return err;
+   }
+   return 0;
+}
+
+void *
+rte_thread_value_get(rte_thread_key key)
+{
+   if (key == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
+   rte_errno = EINVAL;
+   return NULL;
+   }
+   return pthread_getspecific(key->thread_index);
+}
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8be8ed8f36..347df1a6ae 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2021 Mellanox Technologies, Ltd
+ * Copyright(c) 2021 Microsoft Corporation
  */
+#include 
 
 #include 
 #include 
@@ -20,11 +22,50 @@
 extern "C" {
 #endif
 
+#include 
+#if defined(RTE_USE_WINDOWS_THREAD_TYPES)
+#include 
+#else
+#include 
+#endif
+
+/**
+ * Thread id descriptor.
+ */
+typedef struct rte_thread_tag {
+   uintptr_t opaque_id; /**< thread identifier */
+} rte_thread_t;
+
 /**
  * TLS key type, an opaque pointer.
  */
 typedef struct eal_tls_key *rte_thread_key;
 
+/**
+ * Get the id of the calling thread.
+ *
+ * @return
+ *   Return the thread id of the calling thread.
+ */
+__rte_experimental
+rte_thread_t rte_thread_self(void);
+
+/**
+ * Check if 2 thread ids are equal.
+ *
+ * @param t1
+ *   First thread id.
+ *
+ * @param t2
+ *   Second thread id.
+ *
+ * @return
+ *   If the ids are equal, return nonzero.
+ *   Otherwise, return 0.
+ */
+__rte_experimental
+int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
+
 #ifdef RTE_HAS_CPUSET
 
 /**
@@ -63,9 +104,7 @@ void rte_thread_get_affinity(rte_cpuset_t *cpusetp);
  *
  * @return
  *   On success, zero.
- *   On failure, a negative number and an error number is set in rte_errno.
- *   rte_errno can be: ENOMEM  - Memory allocation error.
- * ENOEXEC - Specific OS error.
+ *   On failure, return a positive errno-style error number.
  */
 
 __rte_experimental
@@ -80,9 +119,7 @@ int rte_thread_key_create(rte_

[dpdk-dev] [PATCH v8 02/10] eal: add thread attributes

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Implement thread attributes for:
* thread affinity
* thread priority
Implement functions for managing thread attributes.

Priority is represented through an enum that allows for two levels:
- RTE_THREAD_PRIORITY_NORMAL
- RTE_THREAD_PRIORITY_REALTIME_CRITICAL

Affinity is described by the already known “rte_cpuset_t” type.

An rte_thread_attr_t object can be set to the default values
by calling *rte_thread_attr_init()*.
---
 lib/eal/common/rte_thread.c   | 51 +++
 lib/eal/include/rte_thread.h  | 89 +++
 lib/eal/include/rte_thread_types.h|  3 +
 .../include/rte_windows_thread_types.h|  3 +
 lib/eal/windows/rte_thread.c  | 53 +++
 5 files changed, 199 insertions(+)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 1292f7a8f8..4b1e8f995e 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -9,6 +9,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +34,56 @@ rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
return pthread_equal(t1.opaque_id, t2.opaque_id);
 }
 
+int
+rte_thread_attr_init(rte_thread_attr_t *attr)
+{
+   RTE_ASSERT(attr != NULL);
+
+   CPU_ZERO(&attr->cpuset);
+   attr->priority = RTE_THREAD_PRIORITY_NORMAL;
+
+   return 0;
+}
+
+int
+rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
+rte_cpuset_t *cpuset)
+{
+   if (thread_attr == NULL || cpuset == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid thread attributes parameter\n");
+   return EINVAL;
+   }
+   thread_attr->cpuset = *cpuset;
+   return 0;
+}
+
+int
+rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr,
+rte_cpuset_t *cpuset)
+{
+   if ((thread_attr == NULL) || (cpuset == NULL)) {
+   RTE_LOG(DEBUG, EAL, "Invalid thread attributes parameter\n");
+   return EINVAL;
+   }
+
+   *cpuset = thread_attr->cpuset;
+   return 0;
+}
+
+int
+rte_thread_attr_set_priority(rte_thread_attr_t *thread_attr,
+enum rte_thread_priority priority)
+{
+   if (thread_attr == NULL) {
+   RTE_LOG(DEBUG, EAL,
+   "Unable to set priority attribute, invalid 
parameter\n");
+   return EINVAL;
+   }
+
+   thread_attr->priority = priority;
+   return 0;
+}
+
 int
 rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *))
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 347df1a6ae..eff00023d7 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -36,6 +36,26 @@ typedef struct rte_thread_tag {
uintptr_t opaque_id; /**< thread identifier */
 } rte_thread_t;
 
+/**
+ * Thread priority values.
+ */
+enum rte_thread_priority {
+   RTE_THREAD_PRIORITY_UNDEFINED = 0,
+   /**< priority hasn't been defined */
+   RTE_THREAD_PRIORITY_NORMAL= 1,
+   /**< normal thread priority, the default */
+   RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 2,
+   /**< highest thread priority allowed */
+};
+
+/**
+ * Representation for thread attributes.
+ */
+typedef struct {
+   enum rte_thread_priority priority; /**< thread priority */
+   rte_cpuset_t cpuset; /**< thread affinity */
+} rte_thread_attr_t;
+
 /**
  * TLS key type, an opaque pointer.
  */
@@ -66,6 +86,75 @@ rte_thread_t rte_thread_self(void);
 __rte_experimental
 int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
 
+/**
+ * Initialize the attributes of a thread.
+ * These attributes can be passed to the rte_thread_create() function
+ * that will create a new thread and set its attributes according to attr.
+ *
+ * @param attr
+ *   Thread attributes to initialize.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_init(rte_thread_attr_t *attr);
+
+/**
+ * Set the CPU affinity value in the thread attributes pointed to
+ * by 'thread_attr'.
+ *
+ * @param thread_attr
+ *   Points to the thread attributes in which affinity will be updated.
+ *
+ * @param cpuset
+ *   Points to the value of the affinity to be set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
+   rte_cpuset_t *cpuset);
+
+/**
+ * Get the value of CPU affinity that is set in the thread attributes pointed
+ * to by 'thread_attr'.
+ *
+ * @param thread_attr
+ *   Points to the thread attributes from which affinity will be retrieved.
+ *
+ * @param cpuset
+ *   Pointer to the memory that will store the affinity.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style er

[dpdk-dev] [PATCH v8 03/10] eal/windows: translate Windows errors to errno-style errors

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add function to translate Windows error codes to
errno-style error codes. The possible return values are chosen
so that we have as much semantical compatibility between platforms as
possible.
---
 lib/eal/include/rte_thread.h |  5 +-
 lib/eal/windows/rte_thread.c | 90 +++-
 2 files changed, 71 insertions(+), 24 deletions(-)

diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index eff00023d7..f3eeb28753 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -236,9 +236,8 @@ int rte_thread_value_set(rte_thread_key key, const void 
*value);
  *
  * @return
  *   On success, value data pointer (can also be NULL).
- *   On failure, NULL and an error number is set in rte_errno.
- *   rte_errno can be: EINVAL  - Invalid parameter passed.
- * ENOEXEC - Specific OS error.
+ *   On failure, NULL and a positive error number is set in rte_errno.
+ *
  */
 __rte_experimental
 void *rte_thread_value_get(rte_thread_key key);
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index cc319d3628..6ea1dc2a05 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -13,6 +13,54 @@ struct eal_tls_key {
DWORD thread_index;
 };
 
+/* Translates the most common error codes related to threads */
+static int
+thread_translate_win32_error(DWORD error)
+{
+   switch (error) {
+   case ERROR_SUCCESS:
+   return 0;
+
+   case ERROR_INVALID_PARAMETER:
+   return EINVAL;
+
+   case ERROR_INVALID_HANDLE:
+   return EFAULT;
+
+   case ERROR_NOT_ENOUGH_MEMORY:
+   /* FALLTHROUGH */
+   case ERROR_NO_SYSTEM_RESOURCES:
+   return ENOMEM;
+
+   case ERROR_PRIVILEGE_NOT_HELD:
+   /* FALLTHROUGH */
+   case ERROR_ACCESS_DENIED:
+   return EACCES;
+
+   case ERROR_ALREADY_EXISTS:
+   return EEXIST;
+
+   case ERROR_POSSIBLE_DEADLOCK:
+   return EDEADLK;
+
+   case ERROR_INVALID_FUNCTION:
+   /* FALLTHROUGH */
+   case ERROR_CALL_NOT_IMPLEMENTED:
+   return ENOSYS;
+   }
+
+   return EINVAL;
+}
+
+static int
+thread_log_last_error(const char* message)
+{
+   DWORD error = GetLastError();
+   RTE_LOG(DEBUG, EAL, "GetLastError()=%lu: %s\n", error, message);
+
+   return thread_translate_win32_error(error);
+}
+
 rte_thread_t
 rte_thread_self(void)
 {
@@ -85,18 +133,18 @@ int
 rte_thread_key_create(rte_thread_key *key,
__rte_unused void (*destructor)(void *))
 {
+   int ret;
+
*key = malloc(sizeof(**key));
if ((*key) == NULL) {
RTE_LOG(DEBUG, EAL, "Cannot allocate TLS key.\n");
-   rte_errno = ENOMEM;
-   return -1;
+   return ENOMEM;
}
(*key)->thread_index = TlsAlloc();
if ((*key)->thread_index == TLS_OUT_OF_INDEXES) {
-   RTE_LOG_WIN32_ERR("TlsAlloc()");
+   ret = thread_log_last_error("TlsAlloc()");
free(*key);
-   rte_errno = ENOEXEC;
-   return -1;
+   return ret;
}
return 0;
 }
@@ -104,16 +152,16 @@ rte_thread_key_create(rte_thread_key *key,
 int
 rte_thread_key_delete(rte_thread_key key)
 {
-   if (!key) {
+   int ret;
+
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
-   rte_errno = EINVAL;
-   return -1;
+   return EINVAL;
}
if (!TlsFree(key->thread_index)) {
-   RTE_LOG_WIN32_ERR("TlsFree()");
+   ret = thread_log_last_error("TlsFree()");
free(key);
-   rte_errno = ENOEXEC;
-   return -1;
+   return ret;
}
free(key);
return 0;
@@ -122,19 +170,17 @@ rte_thread_key_delete(rte_thread_key key)
 int
 rte_thread_value_set(rte_thread_key key, const void *value)
 {
+   int ret;
char *p;
 
-   if (!key) {
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
-   rte_errno = EINVAL;
-   return -1;
+   return EINVAL;
}
/* discard const qualifier */
p = (char *) (uintptr_t) value;
if (!TlsSetValue(key->thread_index, p)) {
-   RTE_LOG_WIN32_ERR("TlsSetValue()");
-   rte_errno = ENOEXEC;
-   return -1;
+   return thread_log_last_error("TlsSetValue()");
}
return 0;
 }
@@ -143,16 +189,18 @@ void *
 rte_thread_value_get(rte_thread_key key)
 {
void *output;
+   DWORD ret = 0;
 
-   if (!key) {
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
rte_errno = EINVAL;
return NULL;
}
output = TlsGetValue(key->thread_index);
-   if (GetLastError() !

[dpdk-dev] [PATCH v8 05/10] eal: implement thread priority management functions

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add function for setting the priority for a thread.
Priorities on multiple platforms are similarly determined by
a priority value and a priority class/policy.

On Linux, the following mapping is created:
RTE_THREAD_PRIORITY_NORMAL corresponds to
* policy SCHED_OTHER
* priority value:   (sched_get_priority_min(SCHED_OTHER) +
 sched_get_priority_max(SCHED_OTHER))/2;
RTE_THREAD_PRIORITY_REALTIME_CRITICAL corresponds to
* policy SCHED_RR
* priority value: sched_get_priority_max(SCHED_RR);

On Windows, the following mapping is created:
RTE_THREAD_PRIORITY_NORMAL corresponds to
* class NORMAL_PRIORITY_CLASS
* priority THREAD_PRIORITY_NORMAL
RTE_THREAD_PRIORITY_REALTIME_CRITICAL corresponds to
* class REALTIME_PRIORITY_CLASS
* priority THREAD_PRIORITY_TIME_CRITICAL
---
 lib/eal/common/rte_thread.c   | 51 ++
 lib/eal/include/rte_thread.h  | 17 
 lib/eal/include/rte_thread_types.h|  3 -
 .../include/rte_windows_thread_types.h|  3 -
 lib/eal/windows/rte_thread.c  | 92 +++
 5 files changed, 160 insertions(+), 6 deletions(-)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index ceb27feaa7..5cee19bb7d 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -48,6 +48,57 @@ rte_thread_get_affinity_by_id(rte_thread_t thread_id,
return pthread_getaffinity_np(thread_id.opaque_id, sizeof(*cpuset), 
cpuset);
 }
 
+static int
+thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int *os_pri, 
int *pol)
+{
+   RTE_VERIFY(os_pri != NULL);
+   RTE_VERIFY(pol != NULL);
+
+   /* Clear the output parameters */
+   *os_pri = sched_get_priority_min(SCHED_OTHER) - 1;
+   *pol = -1;
+
+   switch (eal_pri)
+   {
+   case RTE_THREAD_PRIORITY_NORMAL:
+   *pol = SCHED_OTHER;
+
+   /*
+* Choose the middle of the range to represent
+* the priority 'normal'.
+* On Linux, this should be 0, since both
+* sched_get_priority_min/_max return 0 for SCHED_OTHER.
+*/
+   *os_pri = (sched_get_priority_min(SCHED_OTHER) +
+   sched_get_priority_max(SCHED_OTHER))/2;
+   break;
+   case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
+   *pol = SCHED_RR;
+   *os_pri = sched_get_priority_max(SCHED_RR);
+   break;
+   default:
+   RTE_LOG(DEBUG, EAL, "The requested priority value is 
invalid.\n");
+   return EINVAL;
+   }
+   return 0;
+}
+
+int
+rte_thread_set_priority(rte_thread_t thread_id,
+   enum rte_thread_priority priority)
+{
+   int ret;
+   int policy;
+   struct sched_param param;
+
+   ret = thread_map_priority_to_os_value(priority, ¶m.sched_priority, 
&policy);
+   if (ret != 0) {
+   return ret;
+   }
+
+   return pthread_setschedparam(thread_id.opaque_id, policy, ¶m);
+}
+
 int
 rte_thread_attr_init(rte_thread_attr_t *attr)
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 1f02962146..5c54cd9d67 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -122,6 +122,23 @@ __rte_experimental
 int rte_thread_get_affinity_by_id(rte_thread_t thread_id,
rte_cpuset_t *cpuset);
 
+/**
+ * Set the priority of a thread.
+ *
+ * @param thread_id
+ *Id of the thread for which to set priority.
+ *
+ * @param priority
+ *   Priority value to be set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_set_priority(rte_thread_t thread_id,
+   enum rte_thread_priority priority);
+
 /**
  * Initialize the attributes of a thread.
  * These attributes can be passed to the rte_thread_create() function
diff --git a/lib/eal/include/rte_thread_types.h 
b/lib/eal/include/rte_thread_types.h
index 996232c636..d67b24a563 100644
--- a/lib/eal/include/rte_thread_types.h
+++ b/lib/eal/include/rte_thread_types.h
@@ -7,7 +7,4 @@
 
 #include 
 
-#define EAL_THREAD_PRIORITY_NORMAL   0
-#define EAL_THREAD_PRIORITY_REALTIME_CIRTICAL99
-
 #endif /* _RTE_THREAD_TYPES_H_ */
diff --git a/lib/eal/windows/include/rte_windows_thread_types.h 
b/lib/eal/windows/include/rte_windows_thread_types.h
index 5bdeaad3d4..60e6d94553 100644
--- a/lib/eal/windows/include/rte_windows_thread_types.h
+++ b/lib/eal/windows/include/rte_windows_thread_types.h
@@ -7,7 +7,4 @@
 
 #include 
 
-#define EAL_THREAD_PRIORITY_NORMAL THREAD_PRIORITY_NORMAL
-#define EAL_THREAD_PRIORITY_REALTIME_CIRTICAL  THREAD_PRIORITY_TIME_CRITICAL
-
 #endif /* _RTE_THREAD_TYP

[dpdk-dev] [PATCH v8 04/10] eal: implement functions for thread affinity management

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Implement functions for getting/setting thread affinity.
Threads can be pinned to specific cores by setting their
affinity attribute.
---
 lib/eal/common/rte_thread.c   |  14 +++
 lib/eal/include/rte_thread.h  |  36 
 lib/eal/windows/eal_lcore.c   | 169 +-
 lib/eal/windows/eal_windows.h |  10 ++
 lib/eal/windows/rte_thread.c  | 127 -
 5 files changed, 310 insertions(+), 46 deletions(-)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 4b1e8f995e..ceb27feaa7 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -34,6 +34,20 @@ rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
return pthread_equal(t1.opaque_id, t2.opaque_id);
 }
 
+int
+rte_thread_set_affinity_by_id(rte_thread_t thread_id,
+   const rte_cpuset_t *cpuset)
+{
+   return pthread_setaffinity_np(thread_id.opaque_id, sizeof(*cpuset), 
cpuset);
+}
+
+int
+rte_thread_get_affinity_by_id(rte_thread_t thread_id,
+   rte_cpuset_t *cpuset)
+{
+   return pthread_getaffinity_np(thread_id.opaque_id, sizeof(*cpuset), 
cpuset);
+}
+
 int
 rte_thread_attr_init(rte_thread_attr_t *attr)
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index f3eeb28753..1f02962146 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -86,6 +86,42 @@ rte_thread_t rte_thread_self(void);
 __rte_experimental
 int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
 
+/**
+ * Set the affinity of thread 'thread_id' to the cpu set
+ * specified by 'cpuset'.
+ *
+ * @param thread_id
+ *Id of the thread for which to set the affinity.
+ *
+ * @param cpuset
+ *   Pointer to CPU affinity to set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_set_affinity_by_id(rte_thread_t thread_id,
+   const rte_cpuset_t *cpuset);
+
+/**
+ * Get the affinity of thread 'thread_id' and store it
+ * in 'cpuset'.
+ *
+ * @param thread_id
+ *Id of the thread for which to get the affinity.
+ *
+ * @param cpuset
+ *   Pointer for storing the affinity value.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_get_affinity_by_id(rte_thread_t thread_id,
+   rte_cpuset_t *cpuset);
+
 /**
  * Initialize the attributes of a thread.
  * These attributes can be passed to the rte_thread_create() function
diff --git a/lib/eal/windows/eal_lcore.c b/lib/eal/windows/eal_lcore.c
index 476c2d2bdf..519a62b96d 100644
--- a/lib/eal/windows/eal_lcore.c
+++ b/lib/eal/windows/eal_lcore.c
@@ -2,7 +2,6 @@
  * Copyright(c) 2019 Intel Corporation
  */
 
-#include 
 #include 
 #include 
 
@@ -27,13 +26,15 @@ struct socket_map {
 };
 
 struct cpu_map {
-   unsigned int socket_count;
unsigned int lcore_count;
+   unsigned int socket_count;
+   unsigned int cpu_count;
struct lcore_map lcores[RTE_MAX_LCORE];
struct socket_map sockets[RTE_MAX_NUMA_NODES];
+   GROUP_AFFINITY cpus[CPU_SETSIZE];
 };
 
-static struct cpu_map cpu_map = { 0 };
+static struct cpu_map cpu_map;
 
 /* eal_create_cpu_map() is called before logging is initialized */
 static void
@@ -47,13 +48,111 @@ log_early(const char *format, ...)
va_end(va);
 }
 
+static int
+eal_query_group_affinity(void)
+{
+   SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *infos = NULL;
+   DWORD infos_size = 0;
+   int ret = 0;
+
+   if (!GetLogicalProcessorInformationEx(RelationGroup, NULL,
+ &infos_size)) {
+   DWORD error = GetLastError();
+   if (error != ERROR_INSUFFICIENT_BUFFER) {
+   log_early("Cannot get group information size, "
+ "error %lu\n", error);
+   rte_errno = EINVAL;
+   ret = -1;
+   goto cleanup;
+   }
+   }
+
+   infos = malloc(infos_size);
+   if (infos == NULL) {
+   log_early("Cannot allocate memory for NUMA node information\n");
+   rte_errno = ENOMEM;
+   ret = -1;
+   goto cleanup;
+   }
+
+   if (!GetLogicalProcessorInformationEx(RelationGroup, infos,
+ &infos_size)) {
+   log_early("Cannot get group information, error %lu\n",
+ GetLastError());
+   rte_errno = EINVAL;
+   ret = -1;
+   goto cleanup;
+   }
+
+   cpu_map.cpu_count = 0;
+   USHORT group_count = infos->Group.ActiveGroupCount;
+   for (USHORT group_number = 0; group_number < group_count; 
group_number++) {
+   KAFFINITY affinity = 
infos->Group.GroupInfo[group_number].ActiveProcessorMask;
+
+ 

[dpdk-dev] [PATCH v8 06/10] eal: add thread lifetime management

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add function for thread creation, join, canceling, detaching.

The *rte_thread_create()* function can optionally receive an rte_thread_attr_t
object that will cause the thread to be created with the affinity and priority
described by the attributes object. If no rte_thread_attr_t is passed
(parameter is NULL), the default affinity and priority are used.

On Windows, the function executed by a thread when the thread starts is
represeneted by a function pointer of type DWORD (*func) (void*).
On other platforms, the function pointer is a void* (*func) (void*).

Performing a cast between these two types of function pointers to
uniformize the API on all platforms may result in undefined behavior.
TO fix this issue, a wrapper that respects the signature required by
CreateThread() has been created on Windows.
---
 lib/eal/common/rte_thread.c  | 116 +
 lib/eal/include/rte_thread.h |  67 +++
 lib/eal/windows/rte_thread.c | 162 +++
 3 files changed, 345 insertions(+)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 5cee19bb7d..84050d0f4c 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -149,6 +149,122 @@ rte_thread_attr_set_priority(rte_thread_attr_t 
*thread_attr,
return 0;
 }
 
+int
+rte_thread_create(rte_thread_t *thread_id,
+ const rte_thread_attr_t *thread_attr,
+ void *(*thread_func)(void *), void *args)
+{
+   int ret = 0;
+   pthread_attr_t attr;
+   pthread_attr_t *attrp = NULL;
+   struct sched_param param = {
+   .sched_priority = 0,
+   };
+   int policy = SCHED_OTHER;
+
+   if (thread_attr != NULL) {
+   ret = pthread_attr_init(&attr);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_init failed\n");
+   goto cleanup;
+   }
+
+   attrp = &attr;
+
+   /*
+* Set the inherit scheduler parameter to explicit,
+* otherwise the priority attribute is ignored.
+*/
+   ret = pthread_attr_setinheritsched(attrp,
+  PTHREAD_EXPLICIT_SCHED);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_setinheritsched 
failed\n");
+   goto cleanup;
+   }
+
+   /*
+* In case a realtime scheduling policy is requested,
+* the sched_priority parameter is set to the value stored in
+* thread_attr. Otherwise, for the default scheduling policy
+* (SCHED_OTHER) sched_priority needs to be initialized to 0.
+*/
+   if (thread_attr->priority == 
RTE_THREAD_PRIORITY_REALTIME_CRITICAL) {
+   policy = SCHED_RR;
+   param.sched_priority = thread_attr->priority;
+   }
+
+   ret = pthread_attr_setschedpolicy(attrp, policy);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_setschedpolicy 
failed\n");
+   goto cleanup;
+   }
+
+   ret = pthread_attr_setschedparam(attrp, ¶m);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_setschedparam 
failed\n");
+   goto cleanup;
+   }
+
+   ret = pthread_attr_setaffinity_np(attrp,
+ sizeof(thread_attr->cpuset),
+ &thread_attr->cpuset);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_setaffinity_np 
failed\n");
+   goto cleanup;
+   }
+   }
+
+   ret = pthread_create(&thread_id->opaque_id, attrp, thread_func, args);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_create failed\n");
+   goto cleanup;
+   }
+
+cleanup:
+   if (attrp != NULL)
+   pthread_attr_destroy(&attr);
+
+   return ret;
+}
+
+int
+rte_thread_join(rte_thread_t thread_id, int *value_ptr)
+{
+   int ret = 0;
+   void *res = NULL;
+   void **pres = NULL;
+
+   if (value_ptr != NULL)
+   pres = &res;
+
+   ret = pthread_join(thread_id.opaque_id, pres);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_join failed\n");
+   return ret;
+   }
+
+   if (pres != NULL)
+   *value_ptr = *(int *)(*pres);
+
+   return 0;
+}
+
+int rte_thread_cancel(rte_thread_t thread_id)
+{
+   /*
+* TODO: Behavior is different between POSIX and Windows threads.
+* POSIX threads wait for a cancellation point.
+* Current Windows emulation kills thread at any point.
+*/
+   ret

[dpdk-dev] [PATCH v8 07/10] eal: implement functions for mutex management

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add functions for mutex init, destroy, lock, unlock.

On Linux, static initialization of a mutex is possible
through PTHREAD_MUTEX_INITIALIZER.

Windows does not have a static initializer.
Initialization is only done through InitializeCriticalSection().

To simulate static initialization, a fake initializator has been added:
The rte_mutex_lock() function will verify if the mutex has been initialized
using this fake initializer and if so, it will perform additional
initialization.
---
 lib/eal/common/rte_thread.c   | 24 ++
 lib/eal/include/rte_thread.h  | 53 
 lib/eal/include/rte_thread_types.h|  4 +
 .../include/rte_windows_thread_types.h|  9 ++
 lib/eal/windows/rte_thread.c  | 83 ++-
 5 files changed, 172 insertions(+), 1 deletion(-)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 84050d0f4c..4d7d9242a9 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -249,6 +249,30 @@ rte_thread_join(rte_thread_t thread_id, int *value_ptr)
return 0;
 }
 
+int
+rte_thread_mutex_init(rte_thread_mutex_t *mutex)
+{
+   return pthread_mutex_init(mutex, NULL);
+}
+
+int
+rte_thread_mutex_lock(rte_thread_mutex_t *mutex)
+{
+   return pthread_mutex_lock(mutex);
+}
+
+int
+rte_thread_mutex_unlock(rte_thread_mutex_t *mutex)
+{
+   return pthread_mutex_unlock(mutex);
+}
+
+int
+rte_thread_mutex_destroy(rte_thread_mutex_t *mutex)
+{
+   return pthread_mutex_destroy(mutex);
+}
+
 int rte_thread_cancel(rte_thread_t thread_id)
 {
/*
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 1d481b9ad5..db8ef20930 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -248,6 +248,58 @@ int rte_thread_create(rte_thread_t *thread_id,
 __rte_experimental
 int rte_thread_join(rte_thread_t thread_id, int *value_ptr);
 
+/**
+ * Initializes a mutex.
+ *
+ * @param mutex
+ *The mutex to be initialized.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_mutex_init(rte_thread_mutex_t *mutex);
+
+/**
+ * Locks a mutex.
+ *
+ * @param mutex
+ *The mutex to be locked.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_mutex_lock(rte_thread_mutex_t *mutex);
+
+/**
+ * Unlocks a mutex.
+ *
+ * @param mutex
+ *The mutex to be unlocked.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_mutex_unlock(rte_thread_mutex_t *mutex);
+
+/**
+ * Releases all resources associated with a mutex.
+ *
+ * @param mutex
+ *The mutex to be uninitialized.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_mutex_destroy(rte_thread_mutex_t *mutex);
+
 /**
  * Terminates a thread.
  *
@@ -283,6 +335,7 @@ int rte_thread_detach(rte_thread_t thread_id);
  *
  * @param cpusetp
  *   Pointer to CPU affinity to set.
+ *
  * @return
  *   On success, return 0; otherwise return -1;
  */
diff --git a/lib/eal/include/rte_thread_types.h 
b/lib/eal/include/rte_thread_types.h
index d67b24a563..7bb0d2948c 100644
--- a/lib/eal/include/rte_thread_types.h
+++ b/lib/eal/include/rte_thread_types.h
@@ -7,4 +7,8 @@
 
 #include 
 
+#define RTE_THREAD_MUTEX_INITIALIZER PTHREAD_MUTEX_INITIALIZER
+
+typedef pthread_mutex_t rte_thread_mutex_t;
+
 #endif /* _RTE_THREAD_TYPES_H_ */
diff --git a/lib/eal/windows/include/rte_windows_thread_types.h 
b/lib/eal/windows/include/rte_windows_thread_types.h
index 60e6d94553..c6c8502bfb 100644
--- a/lib/eal/windows/include/rte_windows_thread_types.h
+++ b/lib/eal/windows/include/rte_windows_thread_types.h
@@ -7,4 +7,13 @@
 
 #include 
 
+#define WINDOWS_MUTEX_INITIALIZER   (void*)-1
+#define RTE_THREAD_MUTEX_INITIALIZER{WINDOWS_MUTEX_INITIALIZER}
+
+struct thread_mutex_t {
+   void* mutex_id;
+};
+
+typedef struct thread_mutex_t rte_thread_mutex_t;
+
 #endif /* _RTE_THREAD_TYPES_H_ */
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index 5afdd54e15..239aa6be5d 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -470,6 +470,88 @@ rte_thread_join(rte_thread_t thread_id, int *value_ptr)
return ret;
 }
 
+int
+rte_thread_mutex_init(rte_thread_mutex_t *mutex)
+{
+   int ret = 0;
+   CRITICAL_SECTION *m = NULL;
+
+   RTE_VERIFY(mutex != NULL);
+
+   m = calloc(1, sizeof(*m));
+   if (m == NULL) {
+   RTE_LOG(DEBUG, EAL, "Unable to initialize mutex. Insufficient 
memory!\n");
+   ret = ENOMEM;
+   goto cleanup;
+   }
+
+   InitializeCriticalSection(m)

[dpdk-dev] [PATCH v8 08/10] eal: implement functions for thread barrier management

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add functions for barrier init, destroy, wait.

A portable type is used to represent a barrier identifier.
The rte_thread_barrier_wait() function returns the same value
on all platforms.
---
 lib/eal/common/rte_thread.c  | 61 
 lib/eal/include/rte_thread.h | 58 ++
 lib/eal/windows/rte_thread.c | 56 +
 3 files changed, 175 insertions(+)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 4d7d9242a9..0a9813794a 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -273,6 +273,67 @@ rte_thread_mutex_destroy(rte_thread_mutex_t *mutex)
return pthread_mutex_destroy(mutex);
 }
 
+int
+rte_thread_barrier_init(rte_thread_barrier_t *barrier, int count)
+{
+   int ret = 0;
+   pthread_barrier_t *pthread_barrier = NULL;
+
+   RTE_VERIFY(barrier != NULL);
+   RTE_VERIFY(count > 0);
+
+   pthread_barrier = calloc(1, sizeof(*pthread_barrier));
+   if (pthread_barrier == NULL) {
+   RTE_LOG(DEBUG, EAL, "Unable to initialize barrier. Insufficient 
memory!\n");
+   ret = ENOMEM;
+   goto cleanup;
+   }
+   ret = pthread_barrier_init(pthread_barrier, NULL, count);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "Unable to initialize barrier, ret = %d\n", 
ret);
+   goto cleanup;
+   }
+
+   barrier->barrier_id = pthread_barrier;
+   pthread_barrier = NULL;
+
+cleanup:
+   free(pthread_barrier);
+   return ret;
+}
+
+int rte_thread_barrier_wait(rte_thread_barrier_t *barrier)
+{
+   int ret = 0;
+
+   RTE_VERIFY(barrier != NULL);
+   RTE_VERIFY(barrier->barrier_id != NULL);
+
+   ret = pthread_barrier_wait(barrier->barrier_id);
+   if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
+   ret = RTE_THREAD_BARRIER_SERIAL_THREAD;
+   }
+
+   return ret;
+}
+
+int rte_thread_barrier_destroy(rte_thread_barrier_t *barrier)
+{
+   int ret = 0;
+
+   RTE_VERIFY(barrier != NULL);
+
+   ret = pthread_barrier_destroy(barrier->barrier_id);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "Unable to destroy barrier, ret = %d\n", 
ret);
+   }
+
+   free(barrier->barrier_id);
+   barrier->barrier_id = NULL;
+
+   return ret;
+}
+
 int rte_thread_cancel(rte_thread_t thread_id)
 {
/*
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index db8ef20930..b06443cf23 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -29,6 +29,11 @@ extern "C" {
 #include 
 #endif
 
+/**
+ * Returned by rte_thread_barrier_wait() when call is successful.
+ */
+#define RTE_THREAD_BARRIER_SERIAL_THREAD -1
+
 /**
  * Thread id descriptor.
  */
@@ -56,6 +61,13 @@ typedef struct {
rte_cpuset_t cpuset; /**< thread affinity */
 } rte_thread_attr_t;
 
+/**
+ * Thread barrier representation.
+ */
+typedef struct rte_thread_barrier_tag {
+   void* barrier_id;  /**< barrrier identifier */
+} rte_thread_barrier_t;
+
 /**
  * TLS key type, an opaque pointer.
  */
@@ -300,6 +312,52 @@ int rte_thread_mutex_unlock(rte_thread_mutex_t *mutex);
 __rte_experimental
 int rte_thread_mutex_destroy(rte_thread_mutex_t *mutex);
 
+/**
+ * Initializes a synchronization barrier.
+ *
+ * @param barrier
+ *A pointer that references the newly created 'barrier' object.
+ *
+ * @param count
+ *The number of threads that must enter the barrier before
+ *the threads can continue execution.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_barrier_init(rte_thread_barrier_t *barrier, int count);
+
+/**
+ * Causes the calling thread to wait at the synchronization barrier 'barrier'.
+ *
+ * @param barrier
+ *The barrier used for synchronizing the threads.
+ *
+ * @return
+ *   Return RTE_THREAD_BARRIER_SERIAL_THREAD for the thread synchronized
+ *  at the barrier.
+ *   Return 0 for all other threads.
+ *   Return a positive errno-style error number, in case of failure.
+ */
+__rte_experimental
+int rte_thread_barrier_wait(rte_thread_barrier_t *barrier);
+
+/**
+ * Releases all resources used by a synchronization barrier
+ * and uninitializes it.
+ *
+ * @param barrier
+ *The barrier to be destroyed.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_barrier_destroy(rte_thread_barrier_t *barrier);
+
 /**
  * Terminates a thread.
  *
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index 239aa6be5d..2e657bbde8 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -552,6 +552,62 @@ rte_thread_mutex_destroy(rte_thread_mutex_t *mutex)
return 0;
 }
 
+int
+rte_thread_barrier_init(rte_thread_barrier_t *barrier, int

[dpdk-dev] [PATCH v8 09/10] eal: add EAL argument for setting thread priority

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Allow the user to choose the thread priority through an EAL
command line argument.

The user can choose thread priority through an EAL parameter,
when starting an application.  If EAL parameter is not used,
the per-platform default value for thread priority is used.
Otherwise administrator has an option to set one of available options:
 --thread-prio normal
 --thread-prio realtime

 Example:
./dpdk-l2fwd -l 0-3 -n 4 –thread-prio normal -- -q 8 -p 
---
 lib/eal/common/eal_common_options.c | 28 +++-
 lib/eal/common/eal_internal_cfg.h   |  2 ++
 lib/eal/common/eal_options.h|  2 ++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index ff5861b5f3..9d29696b84 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -107,6 +107,7 @@ eal_long_options[] = {
{OPT_TELEMETRY, 0, NULL, OPT_TELEMETRY_NUM},
{OPT_NO_TELEMETRY,  0, NULL, OPT_NO_TELEMETRY_NUM },
{OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL, OPT_FORCE_MAX_SIMD_BITWIDTH_NUM},
+   {OPT_THREAD_PRIORITY,   1, NULL, OPT_THREAD_PRIORITY_NUM},
 
/* legacy options that will be removed in future */
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -1412,6 +1413,24 @@ eal_parse_simd_bitwidth(const char *arg)
return 0;
 }
 
+static int
+eal_parse_thread_priority(const char *arg)
+{
+   struct internal_config *internal_conf =
+   eal_get_internal_configuration();
+   enum rte_thread_priority priority;
+
+   if (!strncmp("normal", arg, sizeof("normal")))
+   priority = RTE_THREAD_PRIORITY_NORMAL;
+   else if (!strncmp("realtime", arg, sizeof("realtime")))
+   priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
+   else
+   return -1;
+
+   internal_conf->thread_priority = priority;
+   return 0;
+}
+
 static int
 eal_parse_base_virtaddr(const char *arg)
 {
@@ -1825,7 +1844,13 @@ eal_parse_common_option(int opt, const char *optarg,
return -1;
}
break;
-
+   case OPT_THREAD_PRIORITY_NUM:
+   if (eal_parse_thread_priority(optarg) < 0) {
+   RTE_LOG(ERR, EAL, "invalid parameter for --"
+   OPT_THREAD_PRIORITY "\n");
+   return -1;
+   }
+   break;
/* don't know what to do, leave this to caller */
default:
return 1;
@@ -2088,6 +2113,7 @@ eal_common_usage(void)
   "  (can be used multiple times)\n"
   "  --"OPT_VMWARE_TSC_MAP"Use VMware TSC map instead of 
native RDTSC\n"
   "  --"OPT_PROC_TYPE" Type of this process 
(primary|secondary|auto)\n"
+  "  --"OPT_THREAD_PRIORITY"   Set threads priority 
(normal|realtime)\n"
 #ifndef RTE_EXEC_ENV_WINDOWS
   "  --"OPT_SYSLOG"Set syslog facility\n"
 #endif
diff --git a/lib/eal/common/eal_internal_cfg.h 
b/lib/eal/common/eal_internal_cfg.h
index d6c0470eb8..b2996cd65b 100644
--- a/lib/eal/common/eal_internal_cfg.h
+++ b/lib/eal/common/eal_internal_cfg.h
@@ -94,6 +94,8 @@ struct internal_config {
unsigned int no_telemetry; /**< true to disable Telemetry */
struct simd_bitwidth max_simd_bitwidth;
/**< max simd bitwidth path to use */
+   enum rte_thread_priority thread_priority;
+   /**< thread priority to configure */
 };
 
 void eal_reset_internal_config(struct internal_config *internal_cfg);
diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h
index 7b348e707f..9f5b209f64 100644
--- a/lib/eal/common/eal_options.h
+++ b/lib/eal/common/eal_options.h
@@ -93,6 +93,8 @@ enum {
OPT_NO_TELEMETRY_NUM,
 #define OPT_FORCE_MAX_SIMD_BITWIDTH  "force-max-simd-bitwidth"
OPT_FORCE_MAX_SIMD_BITWIDTH_NUM,
+#define OPT_THREAD_PRIORITY  "thread-prio"
+   OPT_THREAD_PRIORITY_NUM,
 
/* legacy option that will be removed in future */
 #define OPT_PCI_BLACKLIST "pci-blacklist"
-- 
2.31.0.vfs.0.1



[dpdk-dev] [PATCH v9 00/10] eal: Add EAL API for threading

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

EAL thread API

**Problem Statement**
DPDK currently uses the pthread interface to create and manage threads.
Windows does not support the POSIX thread programming model, so it currently
relies on a header file that hides the Windows calls under
pthread matched interfaces. Given that EAL should isolate the environment
specifics from the applications and libraries and mediate
all the communication with the operating systems, a new EAL interface
is needed for thread management.

**Goals**
* Introduce a generic EAL API for threading support that will remove
  the current Windows pthread.h shim.
* Replace references to pthread_* across the DPDK codebase with the new
  RTE_THREAD_* API.
* Allow users to choose between using the RTE_THREAD_* API or a
  3rd party thread library through a configuration option.

**Design plan**
New API main files:
* rte_thread.h (librte_eal/include)
* rte_thread_types.h (librte_eal/include)
* rte_thread_windows_types.h (librte_eal/windows/include)
* rte_thread.c (librte_eal/windows)
* rte_thread.c (librte_eal/common)

For flexibility, the user is offered the option of either using the 
RTE_THREAD_* API or
a 3rd party thread library, through a meson flag “use_external_thread_lib”.
By default, this flag is set to FALSE, which means Windows libraries and 
applications
will use the RTE_THREAD_* API for managing threads.

If compiling on Windows and the “use_external_thread_lib” is *not* set,
the following files will be parsed: 
* include/rte_thread.h
* windows/include/rte_thread_windows_types.h
* windows/rte_thread.c
In all other cases, the compilation/parsing includes the following files:
* include/rte_thread.h 
* include/rte_thread_types.h
* common/rte_thread.c

**A schematic example of the design**
--
lib/librte_eal/include/rte_thread.h
int rte_thread_create();

lib/librte_eal/common/rte_thread.c
int rte_thread_create() 
{
return pthread_create();
}

lib/librte_eal/windows/rte_thread.c
int rte_thread_create() 
{
return CreateThread();
}

lib/librte_eal/windows/meson.build
if get_option('use_external_thread_lib')
sources += 'librte_eal/common/rte_thread.c'
else
sources += 'librte_eal/windows/rte_thread.c'
endif
-

**Thread attributes**

When or after a thread is created, specific characteristics of the thread
can be adjusted. Given that the thread characteristics that are of interest
for DPDK applications are affinity and priority, the following structure
that represents thread attributes has been defined:

typedef struct
{
enum rte_thread_priority priority;
rte_cpuset_t cpuset;
} rte_thread_attr_t;

The *rte_thread_create()* function can optionally receive an rte_thread_attr_t
object that will cause the thread to be created with the affinity and priority
described by the attributes object. If no rte_thread_attr_t is passed
(parameter is NULL), the default affinity and priority are used.
An rte_thread_attr_t object can also be set to the default values
by calling *rte_thread_attr_init()*.

*Priority* is represented through an enum that currently advertises
two values for priority:
- RTE_THREAD_PRIORITY_NORMAL
- RTE_THREAD_PRIORITY_REALTIME_CRITICAL
The enum can be extended to allow for multiple priority levels.
rte_thread_set_priority  - sets the priority of a thread
rte_thread_attr_set_priority - updates an rte_thread_attr_t object
   with a new value for priority

The user can choose thread priority through an EAL parameter,
when starting an application.  If EAL parameter is not used,
the per-platform default value for thread priority is used.
Otherwise administrator has an option to set one of available options:
 --thread-prio normal
 --thread-prio realtime

Example:
./dpdk-l2fwd -l 0-3 -n 4 –thread-prio normal -- -q 8 -p 

*Affinity* is described by the already known “rte_cpuset_t” type.
rte_thread_attr_set/get_affinity - sets/gets the affinity field in a
   rte_thread_attr_t object
rte_thread_set/get_affinity  – sets/gets the affinity of a thread

**Errors**
A translation function that maps Windows error codes to errno-style
error codes is provided. 

**Future work**
Note that this patchset was focused on introducing new API that will
remove the Windows pthread.h shim. In DPDK, there are still a few references
to pthread_* that were not implemented in the shim.
The long term plan is for EAL to provide full threading support:
* Adding support for conditional variables
* Additional functionality offered by pthread_* (such as pthread_setname_np, 
etc.)
* Static mutex initializers are not used on Windows. If we must continue
  using them, they need to be platform dependent and an implementation will
  need to be provided for Windows.

v9:
- Sign patches

v8:
- Rebase
- Add rte_thread_detach() API
- Set default priority, when use

[dpdk-dev] [PATCH v9 01/10] eal: add thread id and simple thread functions

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Use a portable, type-safe representation for the thread identifier.
Add functions for comparing thread ids and obtaining the thread id
for the current thread.

Signed-off-by: Narcisa Vasile 
---
 lib/eal/common/rte_thread.c   | 105 ++
 lib/eal/include/rte_thread.h  |  53 +++--
 lib/eal/include/rte_thread_types.h|  10 ++
 .../include/rte_windows_thread_types.h|  10 ++
 lib/eal/windows/rte_thread.c  |  17 +++
 5 files changed, 186 insertions(+), 9 deletions(-)
 create mode 100644 lib/eal/common/rte_thread.c
 create mode 100644 lib/eal/include/rte_thread_types.h
 create mode 100644 lib/eal/windows/include/rte_windows_thread_types.h

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
new file mode 100644
index 00..1292f7a8f8
--- /dev/null
+++ b/lib/eal/common/rte_thread.c
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 Mellanox Technologies, Ltd
+ * Copyright(c) 2021 Microsoft Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+struct eal_tls_key {
+   pthread_key_t thread_index;
+};
+
+rte_thread_t
+rte_thread_self(void)
+{
+   rte_thread_t thread_id = { 0 };
+
+   thread_id.opaque_id = pthread_self();
+
+   return thread_id;
+}
+
+int
+rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
+{
+   return pthread_equal(t1.opaque_id, t2.opaque_id);
+}
+
+int
+rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *))
+{
+   int err;
+   rte_thread_key k;
+
+   k = malloc(sizeof(*k));
+   if (k == NULL) {
+   RTE_LOG(DEBUG, EAL, "Cannot allocate TLS key.\n");
+   return EINVAL;
+   }
+   err = pthread_key_create(&(k->thread_index), destructor);
+   if (err != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_key_create failed: %s\n",
+strerror(err));
+   free(k);
+   return err;
+   }
+   *key = k;
+   return 0;
+}
+
+int
+rte_thread_key_delete(rte_thread_key key)
+{
+   int err;
+
+   if (key == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
+   return EINVAL;
+   }
+   err = pthread_key_delete(key->thread_index);
+   if (err != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_key_delete failed: %s\n",
+strerror(err));
+   free(key);
+   return err;
+   }
+   free(key);
+   return 0;
+}
+
+int
+rte_thread_value_set(rte_thread_key key, const void *value)
+{
+   int err;
+
+   if (key == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
+   return EINVAL;
+   }
+   err = pthread_setspecific(key->thread_index, value);
+   if (err != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_setspecific failed: %s\n",
+   strerror(err));
+   return err;
+   }
+   return 0;
+}
+
+void *
+rte_thread_value_get(rte_thread_key key)
+{
+   if (key == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
+   rte_errno = EINVAL;
+   return NULL;
+   }
+   return pthread_getspecific(key->thread_index);
+}
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8be8ed8f36..347df1a6ae 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2021 Mellanox Technologies, Ltd
+ * Copyright(c) 2021 Microsoft Corporation
  */
+#include 
 
 #include 
 #include 
@@ -20,11 +22,50 @@
 extern "C" {
 #endif
 
+#include 
+#if defined(RTE_USE_WINDOWS_THREAD_TYPES)
+#include 
+#else
+#include 
+#endif
+
+/**
+ * Thread id descriptor.
+ */
+typedef struct rte_thread_tag {
+   uintptr_t opaque_id; /**< thread identifier */
+} rte_thread_t;
+
 /**
  * TLS key type, an opaque pointer.
  */
 typedef struct eal_tls_key *rte_thread_key;
 
+/**
+ * Get the id of the calling thread.
+ *
+ * @return
+ *   Return the thread id of the calling thread.
+ */
+__rte_experimental
+rte_thread_t rte_thread_self(void);
+
+/**
+ * Check if 2 thread ids are equal.
+ *
+ * @param t1
+ *   First thread id.
+ *
+ * @param t2
+ *   Second thread id.
+ *
+ * @return
+ *   If the ids are equal, return nonzero.
+ *   Otherwise, return 0.
+ */
+__rte_experimental
+int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
+
 #ifdef RTE_HAS_CPUSET
 
 /**
@@ -63,9 +104,7 @@ void rte_thread_get_affinity(rte_cpuset_t *cpusetp);
  *
  * @return
  *   On success, zero.
- *   On failure, a negative number and an error number is set in rte_errno.
- *   rte_errno can be: ENOMEM  - Memory allocation error.
- * ENOEXEC - Specific OS error.
+ *   On failure, return a positive errno-style error number.
  */
 
 __rte_experimental
@@ -80,9 +119,7 @

[dpdk-dev] [PATCH v9 02/10] eal: add thread attributes

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Implement thread attributes for:
* thread affinity
* thread priority
Implement functions for managing thread attributes.

Priority is represented through an enum that allows for two levels:
- RTE_THREAD_PRIORITY_NORMAL
- RTE_THREAD_PRIORITY_REALTIME_CRITICAL

Affinity is described by the already known “rte_cpuset_t” type.

An rte_thread_attr_t object can be set to the default values
by calling *rte_thread_attr_init()*.

Signed-off-by: Narcisa Vasile 
---
 lib/eal/common/rte_thread.c   | 51 +++
 lib/eal/include/rte_thread.h  | 89 +++
 lib/eal/include/rte_thread_types.h|  3 +
 .../include/rte_windows_thread_types.h|  3 +
 lib/eal/windows/rte_thread.c  | 53 +++
 5 files changed, 199 insertions(+)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 1292f7a8f8..4b1e8f995e 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -9,6 +9,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +34,56 @@ rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
return pthread_equal(t1.opaque_id, t2.opaque_id);
 }
 
+int
+rte_thread_attr_init(rte_thread_attr_t *attr)
+{
+   RTE_ASSERT(attr != NULL);
+
+   CPU_ZERO(&attr->cpuset);
+   attr->priority = RTE_THREAD_PRIORITY_NORMAL;
+
+   return 0;
+}
+
+int
+rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
+rte_cpuset_t *cpuset)
+{
+   if (thread_attr == NULL || cpuset == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid thread attributes parameter\n");
+   return EINVAL;
+   }
+   thread_attr->cpuset = *cpuset;
+   return 0;
+}
+
+int
+rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr,
+rte_cpuset_t *cpuset)
+{
+   if ((thread_attr == NULL) || (cpuset == NULL)) {
+   RTE_LOG(DEBUG, EAL, "Invalid thread attributes parameter\n");
+   return EINVAL;
+   }
+
+   *cpuset = thread_attr->cpuset;
+   return 0;
+}
+
+int
+rte_thread_attr_set_priority(rte_thread_attr_t *thread_attr,
+enum rte_thread_priority priority)
+{
+   if (thread_attr == NULL) {
+   RTE_LOG(DEBUG, EAL,
+   "Unable to set priority attribute, invalid 
parameter\n");
+   return EINVAL;
+   }
+
+   thread_attr->priority = priority;
+   return 0;
+}
+
 int
 rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *))
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 347df1a6ae..eff00023d7 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -36,6 +36,26 @@ typedef struct rte_thread_tag {
uintptr_t opaque_id; /**< thread identifier */
 } rte_thread_t;
 
+/**
+ * Thread priority values.
+ */
+enum rte_thread_priority {
+   RTE_THREAD_PRIORITY_UNDEFINED = 0,
+   /**< priority hasn't been defined */
+   RTE_THREAD_PRIORITY_NORMAL= 1,
+   /**< normal thread priority, the default */
+   RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 2,
+   /**< highest thread priority allowed */
+};
+
+/**
+ * Representation for thread attributes.
+ */
+typedef struct {
+   enum rte_thread_priority priority; /**< thread priority */
+   rte_cpuset_t cpuset; /**< thread affinity */
+} rte_thread_attr_t;
+
 /**
  * TLS key type, an opaque pointer.
  */
@@ -66,6 +86,75 @@ rte_thread_t rte_thread_self(void);
 __rte_experimental
 int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
 
+/**
+ * Initialize the attributes of a thread.
+ * These attributes can be passed to the rte_thread_create() function
+ * that will create a new thread and set its attributes according to attr.
+ *
+ * @param attr
+ *   Thread attributes to initialize.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_init(rte_thread_attr_t *attr);
+
+/**
+ * Set the CPU affinity value in the thread attributes pointed to
+ * by 'thread_attr'.
+ *
+ * @param thread_attr
+ *   Points to the thread attributes in which affinity will be updated.
+ *
+ * @param cpuset
+ *   Points to the value of the affinity to be set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
+   rte_cpuset_t *cpuset);
+
+/**
+ * Get the value of CPU affinity that is set in the thread attributes pointed
+ * to by 'thread_attr'.
+ *
+ * @param thread_attr
+ *   Points to the thread attributes from which affinity will be retrieved.
+ *
+ * @param cpuset
+ *   Pointer to the memory that will store the affinity.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, 

[dpdk-dev] [PATCH v9 03/10] eal/windows: translate Windows errors to errno-style errors

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add function to translate Windows error codes to
errno-style error codes. The possible return values are chosen
so that we have as much semantical compatibility between platforms as
possible.

Signed-off-by: Narcisa Vasile 
---
 lib/eal/include/rte_thread.h |  5 +-
 lib/eal/windows/rte_thread.c | 90 +++-
 2 files changed, 71 insertions(+), 24 deletions(-)

diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index eff00023d7..f3eeb28753 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -236,9 +236,8 @@ int rte_thread_value_set(rte_thread_key key, const void 
*value);
  *
  * @return
  *   On success, value data pointer (can also be NULL).
- *   On failure, NULL and an error number is set in rte_errno.
- *   rte_errno can be: EINVAL  - Invalid parameter passed.
- * ENOEXEC - Specific OS error.
+ *   On failure, NULL and a positive error number is set in rte_errno.
+ *
  */
 __rte_experimental
 void *rte_thread_value_get(rte_thread_key key);
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index cc319d3628..6ea1dc2a05 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -13,6 +13,54 @@ struct eal_tls_key {
DWORD thread_index;
 };
 
+/* Translates the most common error codes related to threads */
+static int
+thread_translate_win32_error(DWORD error)
+{
+   switch (error) {
+   case ERROR_SUCCESS:
+   return 0;
+
+   case ERROR_INVALID_PARAMETER:
+   return EINVAL;
+
+   case ERROR_INVALID_HANDLE:
+   return EFAULT;
+
+   case ERROR_NOT_ENOUGH_MEMORY:
+   /* FALLTHROUGH */
+   case ERROR_NO_SYSTEM_RESOURCES:
+   return ENOMEM;
+
+   case ERROR_PRIVILEGE_NOT_HELD:
+   /* FALLTHROUGH */
+   case ERROR_ACCESS_DENIED:
+   return EACCES;
+
+   case ERROR_ALREADY_EXISTS:
+   return EEXIST;
+
+   case ERROR_POSSIBLE_DEADLOCK:
+   return EDEADLK;
+
+   case ERROR_INVALID_FUNCTION:
+   /* FALLTHROUGH */
+   case ERROR_CALL_NOT_IMPLEMENTED:
+   return ENOSYS;
+   }
+
+   return EINVAL;
+}
+
+static int
+thread_log_last_error(const char* message)
+{
+   DWORD error = GetLastError();
+   RTE_LOG(DEBUG, EAL, "GetLastError()=%lu: %s\n", error, message);
+
+   return thread_translate_win32_error(error);
+}
+
 rte_thread_t
 rte_thread_self(void)
 {
@@ -85,18 +133,18 @@ int
 rte_thread_key_create(rte_thread_key *key,
__rte_unused void (*destructor)(void *))
 {
+   int ret;
+
*key = malloc(sizeof(**key));
if ((*key) == NULL) {
RTE_LOG(DEBUG, EAL, "Cannot allocate TLS key.\n");
-   rte_errno = ENOMEM;
-   return -1;
+   return ENOMEM;
}
(*key)->thread_index = TlsAlloc();
if ((*key)->thread_index == TLS_OUT_OF_INDEXES) {
-   RTE_LOG_WIN32_ERR("TlsAlloc()");
+   ret = thread_log_last_error("TlsAlloc()");
free(*key);
-   rte_errno = ENOEXEC;
-   return -1;
+   return ret;
}
return 0;
 }
@@ -104,16 +152,16 @@ rte_thread_key_create(rte_thread_key *key,
 int
 rte_thread_key_delete(rte_thread_key key)
 {
-   if (!key) {
+   int ret;
+
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
-   rte_errno = EINVAL;
-   return -1;
+   return EINVAL;
}
if (!TlsFree(key->thread_index)) {
-   RTE_LOG_WIN32_ERR("TlsFree()");
+   ret = thread_log_last_error("TlsFree()");
free(key);
-   rte_errno = ENOEXEC;
-   return -1;
+   return ret;
}
free(key);
return 0;
@@ -122,19 +170,17 @@ rte_thread_key_delete(rte_thread_key key)
 int
 rte_thread_value_set(rte_thread_key key, const void *value)
 {
+   int ret;
char *p;
 
-   if (!key) {
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
-   rte_errno = EINVAL;
-   return -1;
+   return EINVAL;
}
/* discard const qualifier */
p = (char *) (uintptr_t) value;
if (!TlsSetValue(key->thread_index, p)) {
-   RTE_LOG_WIN32_ERR("TlsSetValue()");
-   rte_errno = ENOEXEC;
-   return -1;
+   return thread_log_last_error("TlsSetValue()");
}
return 0;
 }
@@ -143,16 +189,18 @@ void *
 rte_thread_value_get(rte_thread_key key)
 {
void *output;
+   DWORD ret = 0;
 
-   if (!key) {
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
rte_errno = EINVAL;
return NULL;
}
output = TlsGetValue(key->thread_inde

[dpdk-dev] [PATCH v9 04/10] eal: implement functions for thread affinity management

2021-06-04 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Implement functions for getting/setting thread affinity.
Threads can be pinned to specific cores by setting their
affinity attribute.

Signed-off-by: Narcisa Vasile 
Signed-off-by: Dmitry Malloy 
---
 lib/eal/common/rte_thread.c   |  14 +++
 lib/eal/include/rte_thread.h  |  36 
 lib/eal/windows/eal_lcore.c   | 169 +-
 lib/eal/windows/eal_windows.h |  10 ++
 lib/eal/windows/rte_thread.c  | 127 -
 5 files changed, 310 insertions(+), 46 deletions(-)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 4b1e8f995e..ceb27feaa7 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -34,6 +34,20 @@ rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
return pthread_equal(t1.opaque_id, t2.opaque_id);
 }
 
+int
+rte_thread_set_affinity_by_id(rte_thread_t thread_id,
+   const rte_cpuset_t *cpuset)
+{
+   return pthread_setaffinity_np(thread_id.opaque_id, sizeof(*cpuset), 
cpuset);
+}
+
+int
+rte_thread_get_affinity_by_id(rte_thread_t thread_id,
+   rte_cpuset_t *cpuset)
+{
+   return pthread_getaffinity_np(thread_id.opaque_id, sizeof(*cpuset), 
cpuset);
+}
+
 int
 rte_thread_attr_init(rte_thread_attr_t *attr)
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index f3eeb28753..1f02962146 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -86,6 +86,42 @@ rte_thread_t rte_thread_self(void);
 __rte_experimental
 int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
 
+/**
+ * Set the affinity of thread 'thread_id' to the cpu set
+ * specified by 'cpuset'.
+ *
+ * @param thread_id
+ *Id of the thread for which to set the affinity.
+ *
+ * @param cpuset
+ *   Pointer to CPU affinity to set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_set_affinity_by_id(rte_thread_t thread_id,
+   const rte_cpuset_t *cpuset);
+
+/**
+ * Get the affinity of thread 'thread_id' and store it
+ * in 'cpuset'.
+ *
+ * @param thread_id
+ *Id of the thread for which to get the affinity.
+ *
+ * @param cpuset
+ *   Pointer for storing the affinity value.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_get_affinity_by_id(rte_thread_t thread_id,
+   rte_cpuset_t *cpuset);
+
 /**
  * Initialize the attributes of a thread.
  * These attributes can be passed to the rte_thread_create() function
diff --git a/lib/eal/windows/eal_lcore.c b/lib/eal/windows/eal_lcore.c
index 476c2d2bdf..519a62b96d 100644
--- a/lib/eal/windows/eal_lcore.c
+++ b/lib/eal/windows/eal_lcore.c
@@ -2,7 +2,6 @@
  * Copyright(c) 2019 Intel Corporation
  */
 
-#include 
 #include 
 #include 
 
@@ -27,13 +26,15 @@ struct socket_map {
 };
 
 struct cpu_map {
-   unsigned int socket_count;
unsigned int lcore_count;
+   unsigned int socket_count;
+   unsigned int cpu_count;
struct lcore_map lcores[RTE_MAX_LCORE];
struct socket_map sockets[RTE_MAX_NUMA_NODES];
+   GROUP_AFFINITY cpus[CPU_SETSIZE];
 };
 
-static struct cpu_map cpu_map = { 0 };
+static struct cpu_map cpu_map;
 
 /* eal_create_cpu_map() is called before logging is initialized */
 static void
@@ -47,13 +48,111 @@ log_early(const char *format, ...)
va_end(va);
 }
 
+static int
+eal_query_group_affinity(void)
+{
+   SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *infos = NULL;
+   DWORD infos_size = 0;
+   int ret = 0;
+
+   if (!GetLogicalProcessorInformationEx(RelationGroup, NULL,
+ &infos_size)) {
+   DWORD error = GetLastError();
+   if (error != ERROR_INSUFFICIENT_BUFFER) {
+   log_early("Cannot get group information size, "
+ "error %lu\n", error);
+   rte_errno = EINVAL;
+   ret = -1;
+   goto cleanup;
+   }
+   }
+
+   infos = malloc(infos_size);
+   if (infos == NULL) {
+   log_early("Cannot allocate memory for NUMA node information\n");
+   rte_errno = ENOMEM;
+   ret = -1;
+   goto cleanup;
+   }
+
+   if (!GetLogicalProcessorInformationEx(RelationGroup, infos,
+ &infos_size)) {
+   log_early("Cannot get group information, error %lu\n",
+ GetLastError());
+   rte_errno = EINVAL;
+   ret = -1;
+   goto cleanup;
+   }
+
+   cpu_map.cpu_count = 0;
+   USHORT group_count = infos->Group.ActiveGroupCount;
+   for (USHORT group_number = 0; group_number < group_count; 
group_number++) {
+   KAFFINITY affinity = 
infos->G

  1   2   >