Re: [dpdk-dev] [PATCH v2 0/7] Removal of PCI bus ABIs

2021-10-14 Thread Xia, Chenbo
> -Original Message-
> From: Thomas Monjalon 
> Sent: Thursday, October 14, 2021 2:42 PM
> To: Harris, James R ; Walker, Benjamin
> ; Xia, Chenbo 
> Cc: Liu, Changpeng ; David Marchand
> ; dev@dpdk.org; Aaron Conole ;
> Zawadzki, Tomasz 
> Subject: Re: [dpdk-dev] [PATCH v2 0/7] Removal of PCI bus ABIs
> 
> 14/10/2021 04:21, Xia, Chenbo:
> > From: Thomas Monjalon 
> > > 13/10/2021 19:56, Walker, Benjamin:
> > > > > From: Thomas Monjalon 
> > > > >
> > > > > In order to be perfectly clear, all the changes done around this
> option
> > > > > enable_driver_sdk share the goal of tidying stuff in DPDK so that ABI
> > > becomes
> > > > > better manageable.
> > > > > I think that nobody want to annoy the SPDK project.
> > > > > I understand that the changes effectively add troubles, and I am sorry
> > > about
> > > > > that. If SPDK and other projects can manage with this change, good.
> > > > > If there is a real blocker, we should discuss what are the options.
> > > > >
> > > > > Thanks for your understanding
> > > >
> > > > I completely understand the desire to make the ABI manageable. If I were
> in
> > > your shoes, I'd be doing the same exact thing. What I don't currently
> > > understand is the motivation behind this enable_driver_sdk option. My
> guess is
> > > that it's one of two things.
> > > >
> > > > \1 ABI manageability: You say that's the purpose above, and that was my
> > > initial assumption. But wouldn't that necessarily mean, over time, no
> longer
> > > considering the symbols that were defined by the header files as part of
> the
> > > stable ABI?
> > >
> > > Absolutely. The idea is that we don't guarantee ABI for the drivers.
> > >
> > > > If you still consider these symbols as part of the ABI in shared library
> > > builds, then the enable_driver_sdk option does absolutely nothing to
> improve
> > > the ABI situation, so why bother to have it at all? We can't have packaged
> > > SPDK relying on symbols in a packaged DPDK that are not part of the
> official
> > > ABI.
> > >
> > > > \2 Not supporting out-of-tree drivers: Another option is that you just
> don't
> > > want people writing out of tree drivers.
> > >
> > > We don't want complications due to support of out-of-tree drivers,
> > > but we don't want to forbid them.
> > >
> > > > You can't just drop it outright because people already do it,
> > > > but you'd like to not support it for shared library builds at least.
> > >
> > > I didn't think about it in these terms.
> > > But saying we don't offer compatibility for shared library drivers
> > > is not too far of "no support" indeed.
> > >
> > > > So I'd like to really understand which of these two motivated the
> > > enable_driver_sdk option . Maybe it's not even one of the two above. If it
> is
> > > #1, then I think maybe we can work with DPDK to define a very small set of
> > > out-of-tree driver APIs/ABIs that need to continue to exist in the shared
> > > libraries by default. I do think SPDK needs only a very small number. If
> it's
> > > #2, then that's the entire SPDK use case and I'd ask you to reconsider the
> > > direction.
> > >
> > > Yes I think we need to agree on functions to keep as-is for compatibility.
> > > Waiting for your input please.
> >
> > So, do you mean currently DPDK doesn't guarantee ABI for drivers
> 
> Yes
> 
> > but could have driver ABI in the future?
> 
> I don't think so, not general compatibility,
> but we can think about a way to avoid breaking SPDK specifically,
> which has less requirements.

So the problem here is exposing some APIs to SPDK directly? Without the 
'enable_driver_sdk'
option, I don't see a solution of both exposed and not-ABI. Any idea in your 
mind?

Thanks,
Chenbo

> 
> 



Re: [dpdk-dev] [PATCH] test/hash: fix buffer overflow

2021-10-14 Thread David Marchand
Hello Vladimir,

On Wed, Oct 13, 2021 at 9:27 PM Medvedkin, Vladimir
 wrote:
> > With patch applied, ASan reports another issue.
> > Did you test your fix with ASan?
> >
>
> You're right, for some reason ASAN wasn't enabled.
> I applied patch and built running .ci/linux-build.sh,
> also I build with CFLAGS + LDFLAGS.
>
> Bruce suggested to use meson options instead of using CFLAGS, so
> meson configure build -Db_sanitize=address -Db_lundef=false
> works fine.

Well, yes, you can directly do this.
I linked to my GHA patch in the bz, because I find it easier and
reproducible to push fixes in GHA and get the result: no question
about "did I enable ASan?" or "did I start the test correctly?".

FYI, b_lundef seems necessary only with clang, gcc should be fine without it.
IIUC, those compilers went with different choices on how to pull
libasan (clang went with static, gcc went with shared).
Hopefully, we will have something easier to use in DPDK with Zhihong work.

>
> I'll sent v2 for this.

Thanks, I'll look at it.


-- 
David Marchand



Re: [dpdk-dev] [PATCH v9 1/3] Enable ASan for memory detector on DPDK

2021-10-14 Thread Thomas Monjalon
14/10/2021 08:46, Peng, ZhihongX:
> From: David Marchand 
> > More problematic, linking an external (out of meson) application to a dpdk
> > compiled with ASan is broken.
> > 
> > My environment contains following targets compiled using
> > ./devtools/test-meson-builds.sh:
> > $ ls $HOME/builds/
> > build-arm64-bluefield  build-arm64-host-clang  build-clang-shared build-gcc-
> > shared  build-ppc64le-power8  build-x86-mingw
> > build-arm64-dpaa   build-arm64-octeontx2   build-clang-static
> > build-gcc-static  build-x86-generic
> > 
> > I stopped at patch 1, configured following target to have ASan in them, like
> > this:
> > 
> > $ meson configure $HOME/builds/build-gcc-static -Db_sanitize=address
> > $ meson configure $HOME/builds/build-clang-shared -Db_sanitize=address -
> > Db_lundef=false $ meson configure $HOME/builds/build-x86-generic -
> > Db_sanitize=address
> >^
> >This is the target for which we test linking 
> > a dpdk application
> > out of meson.
> > 
> > $ meson configure $HOME/builds/build-arm64-bluefield -
> > Db_sanitize=address
> 
> I don't know your test platform , arm or x86.
> 
> This is our compilation command:
> Gcc is 9.3.0 or 10.3.0
> CC=gcc meson -Db_sanitize=address x86_64-native-linuxapp-gcc
> ninja -C x86_64-native-linuxapp-gcc
> meson configure -Dexamples=helloworld x86_64-native-linuxapp-gcc
> ninja -C x86_64-native-linuxapp-gcc 
> 
> I don’t know how you get this parameter  $HOME/builds/build-x86-generic.
> 
> Can you send me all your configuration, I will reproduce this error in our 
> environment.

This is written just below:

> Thank you very much for your help!
> 
> > Then ran the check:
> > $ ./devtools/test-meson-builds.sh

Here, this is the command recommended to run in the contributing guide.
https://doc.dpdk.org/guides/contributing/patches.html#checking-compilation
You need, at the very minimum, to install an Arm cross-compiler.

As you are changing stuff in the compilation of the project,
please become familiar with testing compilation in multiple environments.





Re: [dpdk-dev] [PATCH v1 1/1] ci: enable DPDK GHA for arm64 with self-hosted runners

2021-10-14 Thread Serena He
Hi Michael, thanks for the feedback, and here are some comments below.

> On 10/13/21 4:03 AM, Serena He wrote:
> > CI jobs are triggered only for repos installed with given GHApp and
> > runners
> >
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Serena He 
> >
> > ---
> >   .github/workflows/build-arm64.yml | 118
> ++
> >   1 file changed, 118 insertions(+)
> >   create mode 100644 .github/workflows/build-arm64.yml
> >
> > diff --git a/.github/workflows/build-arm64.yml
> > b/.github/workflows/build-arm64.yml
> > new file mode 100644
> > index 00..570563f7c8
> > --- /dev/null
> > +++ b/.github/workflows/build-arm64.yml
> Adding a new workflow should work on our 0-day-bot. We now support
> having multiple workflows so this looks good

Great!

> > @@ -0,0 +1,118 @@
> > +name: build-arm64
> > +
> > +on:
> > +  push:
> > +  schedule:
> > +- cron: '0 0 * * 1'
> nit: Please add a comment for when this is scheduled so we dont have to do
> cron math :)

Sure, I will add that. 

> > +
> > +defaults:
> > +  run:
> > +shell: bash --noprofile --norc -exo pipefail {0}
> > +
> > +jobs:
> > +  build:
> > +# Here, runners for arm64 are accessed by installed GitHub APP, thus
> will not be available by fork.
> > +# you can change the following 'if' and 'runs-on' if you have your own
> runners installed.
> > +# or request to get your repo on the whitelist to use GitHub APP and
> delete this 'if'.
> I think I understand. I think you mean s/GitHub APP/GitHub/ . otherwise I
> dont know what that is. From my understanding you had to request special
> arm-based runners from github
> 
> Are DPDK/dpdk and ovsrobot/dpdk whitelisted to use the arm-based
> runners?
> 
> Maybe there was a thread about this in the past that I missed, but where and
> how do you get these arm-based runners from github?

GitHub APPs are integrations with the GitHub APIs and this one provided here 
will send requests for arm-based runners from AWS cloud. Document will be 
provided along with the APP to make better understanding after release. 
DPDK/dpdk and ovsrobot/dpdk are both whitelisted to use.

> > +if: ${{ github.repository == 'DPDK/dpdk' || github.repository ==
> 'ovsrobot/dpdk' }}
[...]
> > +- name: Generate cache keys
> > +  id: get_ref_keys
> > +  run: |
> > +echo -n '::set-output name=ccache::'
> > +echo 'ccache-${{ matrix.config.os }}-${{ matrix.config.compiler }}-
> ${{ matrix.config.cross }}-'$(date -u +%Y-w%W)
> > +echo -n '::set-output name=libabigail::'
> > +echo 'libabigail-${{ matrix.config.os }}'
> > +echo -n '::set-output name=abi::'
> > +echo 'abi-${{ matrix.config.os }}-${{ matrix.config.compiler }}-
> ${{ matrix.config.cross }}-${{ env.LIBABIGAIL_VERSION }}-
> ${{ env.REF_GIT_TAG }}'
> > +- name: Retrieve ccache cache
> > +  uses: actions/cache@v2
> > +  with:
> > +path: ~/.ccache
> > +key: ${{ steps.get_ref_keys.outputs.ccache }}-${{ github.ref }}
> > +restore-keys: |
> > +  ${{ steps.get_ref_keys.outputs.ccache }}-refs/heads/main
> > +- name: Retrieve libabigail cache
> > +  id: libabigail-cache
> > +  uses: actions/cache@v2
> > +  if: env.ABI_CHECKS == 'true'
> > +  with:
> > +path: libabigail
> > +key: ${{ steps.get_ref_keys.outputs.libabigail }}
> > +- name: Retrieve ABI reference cache
> > +  uses: actions/cache@v2
> > +  if: env.ABI_CHECKS == 'true'
> > +  with:
> > +path: reference
> > +key: ${{ steps.get_ref_keys.outputs.abi }}
> > +- name: Update APT cache
> > +  run: sudo apt update || true
> > +- name: Install packages
> > +  run: sudo apt install -y ccache libnuma-dev python3-setuptools
> > +python3-wheel python3-pip python3-pyelftools ninja-build libbsd-dev
> > +libpcap-dev libibverbs-dev libcrypto++-dev libfdt-dev 
> > libjansson-dev
> > +libarchive-dev zlib1g-dev pkgconf
> > +- name: Install libabigail build dependencies if no cache is available
> > +  if: env.ABI_CHECKS == 'true' && steps.libabigail-cache.outputs.cache-
> hit != 'true'
> > +  run: sudo apt install -y autoconf automake libtool pkg-config 
> > libxml2-
> dev
> > +  libdw-dev
> Lots of caching stuff. All of it needed?

All these caching stuff is in consistence with the build workflow on GitHub 
provided runners, so should be needed.



Re: [dpdk-dev] [PATCH v2 0/7] Removal of PCI bus ABIs

2021-10-14 Thread Thomas Monjalon
14/10/2021 09:00, Xia, Chenbo:
> From: Thomas Monjalon 
> > 14/10/2021 04:21, Xia, Chenbo:
> > > From: Thomas Monjalon 
> > > > Yes I think we need to agree on functions to keep as-is for 
> > > > compatibility.
> > > > Waiting for your input please.
> > >
> > > So, do you mean currently DPDK doesn't guarantee ABI for drivers
> > 
> > Yes
> > 
> > > but could have driver ABI in the future?
> > 
> > I don't think so, not general compatibility,
> > but we can think about a way to avoid breaking SPDK specifically,
> > which has less requirements.
> 
> So the problem here is exposing some APIs to SPDK directly? Without the 
> 'enable_driver_sdk'
> option, I don't see a solution of both exposed and not-ABI. Any idea in your 
> mind?

No the idea is to keep using enable_driver_sdk.
But so far, there is no compatibility guarantee for driver SDK.
The discussion is about which basic compatibility requirement is needed for 
SPDK.





Re: [dpdk-dev] [PATCH v2] net/virtio: handle Tx checksums correctly for tunnel packets

2021-10-14 Thread Xia, Chenbo
> -Original Message-
> From: Ivan Malov 
> Sent: Friday, September 17, 2021 2:50 AM
> To: dev@dpdk.org
> Cc: Maxime Coquelin ; sta...@dpdk.org; Andrew
> Rybchenko ; Xia, Chenbo ;
> Yuanhan Liu ; Olivier Matz
> 
> Subject: [PATCH v2] net/virtio: handle Tx checksums correctly for tunnel
> packets
> 
> Tx prepare method calls rte_net_intel_cksum_prepare(), which
> handles tunnel packets correctly, but Tx burst path does not
> take tunnel presence into account when computing the offsets.
> 
> Fixes: 58169a9c8153 ("net/virtio: support Tx checksum offload")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ivan Malov 
> Reviewed-by: Andrew Rybchenko 
> ---
>  drivers/net/virtio/virtqueue.h | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
> index 03957b2bd0..b83ff32efb 100644
> --- a/drivers/net/virtio/virtqueue.h
> +++ b/drivers/net/virtio/virtqueue.h
> @@ -620,19 +620,21 @@ static inline void
>  virtqueue_xmit_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *cookie)
>  {
>   uint64_t csum_l4 = cookie->ol_flags & PKT_TX_L4_MASK;
> + uint16_t o_l23_len = (cookie->ol_flags & PKT_TX_TUNNEL_MASK) ?
> +  cookie->outer_l2_len + cookie->outer_l3_len : 0;
> 
>   if (cookie->ol_flags & PKT_TX_TCP_SEG)
>   csum_l4 |= PKT_TX_TCP_CKSUM;
> 
>   switch (csum_l4) {
>   case PKT_TX_UDP_CKSUM:
> - hdr->csum_start = cookie->l2_len + cookie->l3_len;
> + hdr->csum_start = o_l23_len + cookie->l2_len + cookie->l3_len;
>   hdr->csum_offset = offsetof(struct rte_udp_hdr, dgram_cksum);
>   hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>   break;
> 
>   case PKT_TX_TCP_CKSUM:
> - hdr->csum_start = cookie->l2_len + cookie->l3_len;
> + hdr->csum_start = o_l23_len + cookie->l2_len + cookie->l3_len;
>   hdr->csum_offset = offsetof(struct rte_tcp_hdr, cksum);
>   hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>   break;
> @@ -650,7 +652,8 @@ virtqueue_xmit_offload(struct virtio_net_hdr *hdr, struct
> rte_mbuf *cookie)
>   VIRTIO_NET_HDR_GSO_TCPV6 :
>   VIRTIO_NET_HDR_GSO_TCPV4;
>   hdr->gso_size = cookie->tso_segsz;
> - hdr->hdr_len = cookie->l2_len + cookie->l3_len + cookie->l4_len;
> + hdr->hdr_len = o_l23_len + cookie->l2_len + cookie->l3_len +
> +cookie->l4_len;
>   } else {
>   ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
>   ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
> --
> 2.20.1

Reviewed-by: Chenbo Xia 



Re: [dpdk-dev] [PATCH v2 1/6] eal/interrupts: implement get set APIs

2021-10-14 Thread David Marchand
On Tue, Oct 5, 2021 at 2:17 PM Harman Kalra  wrote:
> +struct rte_intr_handle *rte_intr_instance_alloc(uint32_t flags)
> +{
> +   struct rte_intr_handle *intr_handle;
> +   bool mem_allocator;

Regardless of the currently defined flags, we want to have an ABI
ready for future changes, so if there is a "flags" input parameter, it
must be checked against valid values.
You can build a RTE_INTR_ALLOC_KNOWN_FLAGS define that contains all
valid flags either in a private header or only in this .c file if no
other unit needs it.
Next, in this function:

if ((flags & ~RTE_INTR_ALLOC_KNOWN_FLAGS) != 0) {
rte_errno = EINVAL;
return NULL;
}

A check in unit tests is then a good thing to add so that developpers
adding new flag get a CI failure.

This is not a blocker as this API is still experimental, but please
let's do this from the start.


> +
> +   mem_allocator = (flags & RTE_INTR_ALLOC_DPDK_ALLOCATOR) != 0;
> +   if (mem_allocator)
> +   intr_handle = rte_zmalloc(NULL, sizeof(struct 
> rte_intr_handle),
> + 0);
> +   else
> +   intr_handle = calloc(1, sizeof(struct rte_intr_handle));


-- 
David Marchand



Re: [dpdk-dev] [PATCH v5] app/testpmd: add option to display extended statistics

2021-10-14 Thread Ferruh Yigit

On 9/15/2021 12:27 PM, Andrew Rybchenko wrote:

From: Ivan Ilchenko 

Add 'display-xstats' option for using in accompanying with Rx/Tx statistics
(i.e. 'stats-period' option or 'show port stats' interactive command) to
display specified list of extended statistics.

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Acked-by: Ajit Khaparde 


<...>


+static int
+alloc_xstats_display_info(portid_t pi)
+{
+   uint64_t **ids_supp = &ports[pi].xstats_info.ids_supp;
+   uint64_t **prev_values = &ports[pi].xstats_info.prev_values;
+   uint64_t **curr_values = &ports[pi].xstats_info.curr_values;
+
+   if (xstats_display_num == 0)
+   return 0;
+
+   *ids_supp = calloc(xstats_display_num, sizeof(**ids_supp));
+   if (*ids_supp == NULL)
+   return -ENOMEM;
+
+   *prev_values = calloc(xstats_display_num,
+ sizeof(**prev_values));
+   if (*prev_values == NULL)
+   return -ENOMEM;
+
+   *curr_values = calloc(xstats_display_num,
+ sizeof(**curr_values));
+   if (*curr_values == NULL)
+   return -ENOMEM;


Can be good to free above allocated memory before return.

<...>


@@ -2886,6 +2990,7 @@ close_port(portid_t pid)
  
  		if (is_proc_primary()) {

port_flow_flush(pi);
+   free_xstats_display_info(pi);


Why free only for primary process?
Aren't these allocated in testpmd level per process?

<...>


+
+#define XSTAT_ID_INVALID UINT64_MAX


Is this macro used at all?




[dpdk-dev] [PATCH v3 0/5] enable protocol agnostic flow offloading in FDIR

2021-10-14 Thread Junfeng Guo
Protocol agnostic flow offloading in Flow Director is enabled by this
patch set based on the Parser Library using existing rte_flow raw API

[PATCH v3 1/5] net/ice/base: add method to disable FDIR SWAP option.
[PATCH v3 2/5] net/ice/base: add function to set HW profile for raw flow.
[PATCH v3 3/5] app/testpmd: update Max RAW pattern size to 512.
[PATCH v3 4/5] net/ice: enable protocol agnostic flow offloading in FDIR.
[PATCH v3 5/5] doc: enable protocol agnostic flow in FDIR.

Junfeng Guo (5):
  net/ice/base: add method to disable FDIR SWAP option
  net/ice/base: add function to set HW profile for raw flow
  app/testpmd: update Max RAW pattern size to 512
  net/ice: enable protocol agnostic flow offloading in FDIR
  doc: enable protocol agnostic flow in FDIR

* v3:
Added necessary base code for raw flow in FDIR.

* v2:
Enabled vxlan port add for raw flow and updated commit message

 app/test-pmd/cmdline_flow.c|   2 +-
 doc/guides/rel_notes/release_21_11.rst |   1 +
 drivers/net/ice/base/ice_flex_pipe.c   | 100 +-
 drivers/net/ice/base/ice_flex_pipe.h   |   6 +-
 drivers/net/ice/base/ice_flow.c|  86 -
 drivers/net/ice/base/ice_flow.h|   4 +
 drivers/net/ice/ice_ethdev.h   |   5 +
 drivers/net/ice/ice_fdir_filter.c  | 172 +
 drivers/net/ice/ice_generic_flow.c |   7 +
 drivers/net/ice/ice_generic_flow.h |   3 +
 10 files changed, 381 insertions(+), 5 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v3 1/5] net/ice/base: add method to disable FDIR SWAP option

2021-10-14 Thread Junfeng Guo
The SWAP Flag in the FDIR Programming Descriptor doesn't work, thus
add a method to disable the FDIR SWAP option by setting the swap and
inset register set with certain values. The boolean fd_swap is used
to enable/disable the SWAP option.

Signed-off-by: Junfeng Guo 
---
 drivers/net/ice/base/ice_flex_pipe.c | 44 ++--
 drivers/net/ice/base/ice_flex_pipe.h |  3 +-
 drivers/net/ice/base/ice_flow.c  |  2 +-
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ice/base/ice_flex_pipe.c 
b/drivers/net/ice/base/ice_flex_pipe.c
index f35d59f4f5..06a233990f 100644
--- a/drivers/net/ice/base/ice_flex_pipe.c
+++ b/drivers/net/ice/base/ice_flex_pipe.c
@@ -4952,6 +4952,43 @@ ice_add_prof_attrib(struct ice_prof_map *prof, u8 ptg, 
u16 ptype,
return ICE_SUCCESS;
 }
 
+/**
+ * ice_disable_fd_swap - set register appropriately to disable FD swap
+ * @hw: pointer to the HW struct
+ * @prof_id: profile ID
+ */
+void ice_disable_fd_swap(struct ice_hw *hw, u16 prof_id)
+{
+   u8 swap_val = ICE_SWAP_VALID;
+   u8 i;
+   /* Since the SWAP Flag in the Programming Desc doesn't work,
+* here add method to disable the SWAP Option via setting
+* certain SWAP and INSET register set.
+*/
+   for (i = 0; i < hw->blk[ICE_BLK_FD].es.fvw / 4; i++) {
+   u32 raw_swap = 0;
+   u32 raw_in = 0;
+   u8 j;
+
+   for (j = 0; j < 4; j++) {
+   raw_swap |= (swap_val++) << (j * BITS_PER_BYTE);
+   raw_in |= ICE_INSET_DFLT << (j * BITS_PER_BYTE);
+   }
+
+   /* write the FDIR swap register set */
+   wr32(hw, GLQF_FDSWAP(prof_id, i), raw_swap);
+
+   ice_debug(hw, ICE_DBG_INIT, "swap wr(%d, %d): %x = %08x\n",
+   prof_id, i, GLQF_FDSWAP(prof_id, i), raw_swap);
+
+   /* write the FDIR inset register set */
+   wr32(hw, GLQF_FDINSET(prof_id, i), raw_in);
+
+   ice_debug(hw, ICE_DBG_INIT, "inset wr(%d, %d): %x = %08x\n",
+   prof_id, i, GLQF_FDINSET(prof_id, i), raw_in);
+   }
+}
+
 /**
  * ice_add_prof - add profile
  * @hw: pointer to the HW struct
@@ -4962,6 +4999,7 @@ ice_add_prof_attrib(struct ice_prof_map *prof, u8 ptg, 
u16 ptype,
  * @attr_cnt: number of elements in attrib array
  * @es: extraction sequence (length of array is determined by the block)
  * @masks: mask for extraction sequence
+ * @fd_swap: enable/disable FDIR paired src/dst fields swap option
  *
  * This function registers a profile, which matches a set of PTYPES with a
  * particular extraction sequence. While the hardware profile is allocated
@@ -4971,7 +5009,7 @@ ice_add_prof_attrib(struct ice_prof_map *prof, u8 ptg, 
u16 ptype,
 enum ice_status
 ice_add_prof(struct ice_hw *hw, enum ice_block blk, u64 id, u8 ptypes[],
 const struct ice_ptype_attributes *attr, u16 attr_cnt,
-struct ice_fv_word *es, u16 *masks)
+struct ice_fv_word *es, u16 *masks, bool fd_swap)
 {
u32 bytes = DIVIDE_AND_ROUND_UP(ICE_FLOW_PTYPE_MAX, BITS_PER_BYTE);
ice_declare_bitmap(ptgs_used, ICE_XLT1_CNT);
@@ -4991,7 +5029,7 @@ ice_add_prof(struct ice_hw *hw, enum ice_block blk, u64 
id, u8 ptypes[],
status = ice_alloc_prof_id(hw, blk, &prof_id);
if (status)
goto err_ice_add_prof;
-   if (blk == ICE_BLK_FD) {
+   if (blk == ICE_BLK_FD && fd_swap) {
/* For Flow Director block, the extraction sequence may
 * need to be altered in the case where there are paired
 * fields that have no match. This is necessary because
@@ -5002,6 +5040,8 @@ ice_add_prof(struct ice_hw *hw, enum ice_block blk, u64 
id, u8 ptypes[],
status = ice_update_fd_swap(hw, prof_id, es);
if (status)
goto err_ice_add_prof;
+   } else if (blk == ICE_BLK_FD) {
+   ice_disable_fd_swap(hw, prof_id);
}
status = ice_update_prof_masking(hw, blk, prof_id, masks);
if (status)
diff --git a/drivers/net/ice/base/ice_flex_pipe.h 
b/drivers/net/ice/base/ice_flex_pipe.h
index 9733c4b214..dd332312dd 100644
--- a/drivers/net/ice/base/ice_flex_pipe.h
+++ b/drivers/net/ice/base/ice_flex_pipe.h
@@ -61,10 +61,11 @@ bool ice_hw_ptype_ena(struct ice_hw *hw, u16 ptype);
 /* XLT2/VSI group functions */
 enum ice_status
 ice_vsig_find_vsi(struct ice_hw *hw, enum ice_block blk, u16 vsi, u16 *vsig);
+void ice_disable_fd_swap(struct ice_hw *hw, u16 prof_id);
 enum ice_status
 ice_add_prof(struct ice_hw *hw, enum ice_block blk, u64 id, u8 ptypes[],
 const struct ice_ptype_attributes *attr, u16 attr_cnt,
-struct ice_fv_word *es, u16 *masks);
+struct ice_fv_

[dpdk-dev] [PATCH v3 2/5] net/ice/base: add function to set HW profile for raw flow

2021-10-14 Thread Junfeng Guo
Based on the parser library, we can directly set HW profile and
associate the main/ctrl vsi.

Signed-off-by: Junfeng Guo 
---
 drivers/net/ice/base/ice_flex_pipe.c | 56 +++
 drivers/net/ice/base/ice_flex_pipe.h |  3 +
 drivers/net/ice/base/ice_flow.c  | 84 
 drivers/net/ice/base/ice_flow.h  |  4 ++
 4 files changed, 147 insertions(+)

diff --git a/drivers/net/ice/base/ice_flex_pipe.c 
b/drivers/net/ice/base/ice_flex_pipe.c
index 06a233990f..be8f014585 100644
--- a/drivers/net/ice/base/ice_flex_pipe.c
+++ b/drivers/net/ice/base/ice_flex_pipe.c
@@ -6365,3 +6365,59 @@ ice_rem_prof_id_flow(struct ice_hw *hw, enum ice_block 
blk, u16 vsi, u64 hdl)
 
return status;
 }
+
+/**
+ * ice_flow_assoc_hw_prof - add profile id flow for main/ctrl VSI flow entry
+ * @hw: pointer to the HW struct
+ * @blk: HW block
+ * @dest_vsi_handle: dest VSI handle
+ * @fdir_vsi_handle: fdir programming VSI handle
+ * @id: profile id (handle)
+ *
+ * Calling this function will update the hardware tables to enable the
+ * profile indicated by the ID parameter for the VSIs specified in the VSI
+ * array. Once successfully called, the flow will be enabled.
+ */
+enum ice_status
+ice_flow_assoc_hw_prof(struct ice_hw *hw, enum ice_block blk,
+  u16 dest_vsi_handle, u16 fdir_vsi_handle, int id)
+{
+   enum ice_status status = ICE_SUCCESS;
+   u16 vsi_num;
+   u16 vsig;
+
+   vsi_num = ice_get_hw_vsi_num(hw, dest_vsi_handle);
+   if (!ice_vsig_find_vsi(hw, blk, vsi_num, &vsig) && !vsig)
+   if (!ice_has_prof_vsig(hw, blk, vsig, id)) {
+   status = ice_add_prof_id_flow(hw, blk, vsi_num, id);
+   if (status) {
+   ice_debug(hw, ICE_DBG_FLOW, "HW profile add 
failed for main VSI flow entry, %d\n",
+ status);
+   goto err_add_prof;
+   }
+   }
+
+   if (blk != ICE_BLK_FD)
+   return status;
+
+   vsi_num = ice_get_hw_vsi_num(hw, fdir_vsi_handle);
+   if (!ice_vsig_find_vsi(hw, blk, vsi_num, &vsig) && !vsig)
+   if (!ice_has_prof_vsig(hw, blk, vsig, id)) {
+   status = ice_add_prof_id_flow(hw, blk, vsi_num, id);
+   if (status) {
+   ice_debug(hw, ICE_DBG_FLOW, "HW profile add 
failed for ctrl VSI flow entry, %d\n",
+ status);
+   goto err_add_entry;
+   }
+   }
+
+   return status;
+
+err_add_entry:
+   vsi_num = ice_get_hw_vsi_num(hw, dest_vsi_handle);
+   ice_rem_prof_id_flow(hw, blk, vsi_num, id);
+err_add_prof:
+   ice_flow_rem_prof(hw, blk, id);
+
+   return status;
+}
diff --git a/drivers/net/ice/base/ice_flex_pipe.h 
b/drivers/net/ice/base/ice_flex_pipe.h
index dd332312dd..23ba45564a 100644
--- a/drivers/net/ice/base/ice_flex_pipe.h
+++ b/drivers/net/ice/base/ice_flex_pipe.h
@@ -76,6 +76,9 @@ enum ice_status
 ice_add_prof_id_flow(struct ice_hw *hw, enum ice_block blk, u16 vsi, u64 hdl);
 enum ice_status
 ice_rem_prof_id_flow(struct ice_hw *hw, enum ice_block blk, u16 vsi, u64 hdl);
+enum ice_status
+ice_flow_assoc_hw_prof(struct ice_hw *hw, enum ice_block blk,
+  u16 dest_vsi_handle, u16 fdir_vsi_handle, int id);
 enum ice_status ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len);
 enum ice_status
 ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len);
diff --git a/drivers/net/ice/base/ice_flow.c b/drivers/net/ice/base/ice_flow.c
index 77b6b130c1..f699dbbc74 100644
--- a/drivers/net/ice/base/ice_flow.c
+++ b/drivers/net/ice/base/ice_flow.c
@@ -2524,6 +2524,90 @@ ice_flow_disassoc_prof(struct ice_hw *hw, enum ice_block 
blk,
return status;
 }
 
+#define FLAG_GTP_EH_PDU_LINK   BIT_ULL(13)
+#define FLAG_GTP_EH_PDUBIT_ULL(14)
+
+#define FLAG_GTPU_MSK  \
+   (FLAG_GTP_EH_PDU | FLAG_GTP_EH_PDU_LINK)
+#define FLAG_GTPU_DW   \
+   (FLAG_GTP_EH_PDU | FLAG_GTP_EH_PDU_LINK)
+#define FLAG_GTPU_UP   \
+   (FLAG_GTP_EH_PDU)
+/**
+ * ice_flow_set_hw_prof - Set HW flow profile based on the parsed profile info
+ * @hw: pointer to the HW struct
+ * @dest_vsi_handle: dest VSI handle
+ * @fdir_vsi_handle: fdir programming VSI handle
+ * @prof: stores parsed profile info from raw flow
+ * @blk: classification stage
+ */
+enum ice_status
+ice_flow_set_hw_prof(struct ice_hw *hw, u16 dest_vsi_handle,
+u16 fdir_vsi_handle, struct ice_parser_profile *prof,
+enum ice_block blk)
+{
+   int id = ice_find_first_bit(prof->ptypes, UINT16_MAX);
+   struct ice_flow_prof_params *params;
+   u8 fv_words = hw->blk[blk].es.fvw;
+   enum ice_status status;
+   u16 vsi_num;
+   int i, idx;
+
+   params = (struct ice_flow_prof_params *)ice_malloc(hw, siz

[dpdk-dev] [PATCH v3 3/5] app/testpmd: update Max RAW pattern size to 512

2021-10-14 Thread Junfeng Guo
Update max size for pattern in struct rte_flow_item_raw to enable
protocol agnostic flow offloading.

Signed-off-by: Junfeng Guo 
---
 app/test-pmd/cmdline_flow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 0b5856c7d5..c8f621a441 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -458,7 +458,7 @@ enum index {
 };
 
 /** Maximum size for pattern in struct rte_flow_item_raw. */
-#define ITEM_RAW_PATTERN_SIZE 40
+#define ITEM_RAW_PATTERN_SIZE 512
 
 /** Maximum size for GENEVE option data pattern in bytes. */
 #define ITEM_GENEVE_OPT_DATA_SIZE 124
-- 
2.25.1



[dpdk-dev] [PATCH v3 4/5] net/ice: enable protocol agnostic flow offloading in FDIR

2021-10-14 Thread Junfeng Guo
Protocol agnostic flow offloading in Flow Director is enabled by this
patch based on the Parser Library, using existing rte_flow raw API.

Note that the raw flow requires:
1. byte string of raw target packet bits.
2. byte string of mask of target packet.

Here is an example:
FDIR matching ipv4 dst addr with 1.2.3.4 and redirect to queue 3:

flow create 0 ingress pattern raw \
pattern spec \
080045144000401001020304 \
pattern mask \
 \
/ end actions queue index 3 / mark id 3 / end

Note that mask of some key bits (e.g., 0x0800 to indicate ipv4 proto)
is optional in our cases. To avoid redundancy, we just omit the mask
of 0x0800 (with 0x) in the mask byte string example. The prefix
'0x' for the spec and mask byte (hex) strings are also omitted here.

Signed-off-by: Junfeng Guo 
---
 drivers/net/ice/ice_ethdev.h   |   5 +
 drivers/net/ice/ice_fdir_filter.c  | 172 +
 drivers/net/ice/ice_generic_flow.c |   7 ++
 drivers/net/ice/ice_generic_flow.h |   3 +
 4 files changed, 187 insertions(+)

diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 5845f44c86..e21d2349bc 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -317,6 +317,11 @@ struct ice_fdir_filter_conf {
uint64_t input_set_o; /* used for non-tunnel or tunnel outer fields */
uint64_t input_set_i; /* only for tunnel inner fields */
uint32_t mark_flag;
+
+   struct ice_parser_profile *prof;
+   const u8 *pkt_buf;
+   bool parser_ena;
+   u8 pkt_len;
 };
 
 #define ICE_MAX_FDIR_FILTER_NUM(1024 * 16)
diff --git a/drivers/net/ice/ice_fdir_filter.c 
b/drivers/net/ice/ice_fdir_filter.c
index bd627e3aa8..4af6f371f4 100644
--- a/drivers/net/ice/ice_fdir_filter.c
+++ b/drivers/net/ice/ice_fdir_filter.c
@@ -107,6 +107,7 @@
ICE_INSET_NAT_T_ESP_SPI)
 
 static struct ice_pattern_match_item ice_fdir_pattern_list[] = {
+   {pattern_raw,   ICE_INSET_NONE, 
ICE_INSET_NONE, ICE_INSET_NONE},
{pattern_ethertype, ICE_FDIR_INSET_ETH, 
ICE_INSET_NONE, ICE_INSET_NONE},
{pattern_eth_ipv4,  
ICE_FDIR_INSET_ETH_IPV4,ICE_INSET_NONE, ICE_INSET_NONE},
{pattern_eth_ipv4_udp,  
ICE_FDIR_INSET_ETH_IPV4_UDP,ICE_INSET_NONE, ICE_INSET_NONE},
@@ -1188,6 +1189,24 @@ ice_fdir_is_tunnel_profile(enum ice_fdir_tunnel_type 
tunnel_type)
return 0;
 }
 
+static int
+ice_fdir_add_del_raw(struct ice_pf *pf,
+struct ice_fdir_filter_conf *filter,
+bool add)
+{
+   struct ice_hw *hw = ICE_PF_TO_HW(pf);
+
+   unsigned char *pkt = (unsigned char *)pf->fdir.prg_pkt;
+   rte_memcpy(pkt, filter->pkt_buf, filter->pkt_len);
+
+   struct ice_fltr_desc desc;
+   memset(&desc, 0, sizeof(desc));
+   filter->input.comp_report = ICE_FXD_FLTR_QW0_COMP_REPORT_SW;
+   ice_fdir_get_prgm_desc(hw, &filter->input, &desc, add);
+
+   return ice_fdir_programming(pf, &desc);
+}
+
 static int
 ice_fdir_add_del_filter(struct ice_pf *pf,
struct ice_fdir_filter_conf *filter,
@@ -1304,6 +1323,45 @@ ice_fdir_create_filter(struct ice_adapter *ad,
bool is_tun;
int ret;
 
+   if (filter->parser_ena) {
+   struct ice_hw *hw = ICE_PF_TO_HW(pf);
+
+   u16 ctrl_vsi = pf->fdir.fdir_vsi->idx;
+   u16 main_vsi = pf->main_vsi->idx;
+
+   ret = ice_flow_set_hw_prof(hw, main_vsi, ctrl_vsi,
+  filter->prof, ICE_BLK_FD);
+   if (ret)
+   return -rte_errno;
+
+   ret = ice_fdir_add_del_raw(pf, filter, true);
+   if (ret)
+   return -rte_errno;
+
+   if (filter->mark_flag == 1)
+   ice_fdir_rx_parsing_enable(ad, 1);
+
+   entry = rte_zmalloc("fdir_entry", sizeof(*entry), 0);
+   if (!entry)
+   return -rte_errno;
+
+   entry->pkt_buf = (u8 *)ice_malloc(hw, filter->pkt_len);
+   if (!entry->pkt_buf)
+   return -ENOMEM;
+
+   u8 *pkt_buf = (u8 *)ice_malloc(hw, filter->pkt_len);
+   if (!pkt_buf)
+   return -ENOMEM;
+
+   rte_memcpy(entry, filter, sizeof(*filter));
+   rte_memcpy(pkt_buf, filter->pkt_buf, filter->pkt_len);
+   entry->pkt_buf = pkt_buf;
+
+   flow->rule = entry;
+
+   return 0;
+   }
+
ice_fdir_extract_fltr_key(&key, filter);
node = ice_fdir_entry_lookup(fdir_info, &key);
if (node) {
@@ -1397,6 +14

[dpdk-dev] [PATCH v3 5/5] doc: enable protocol agnostic flow in FDIR

2021-10-14 Thread Junfeng Guo
Protocol agnostic flow offloading in Flow Director is enabled based
on the Parser Library, using existing rte_flow raw API.

Signed-off-by: Junfeng Guo 
---
 doc/guides/rel_notes/release_21_11.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index d5c762df62..5a46be0a72 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -98,6 +98,7 @@ New Features
 
 * **Updated Intel ice driver.**
 
+  * Added protocol agnostic flow offloading support in Flow Director.
   * Added 1PPS out support by a devargs.
   * Added IPv4 and L4 (TCP/UDP/SCTP) checksum hash support in RSS flow.
   * Added DEV_RX_OFFLOAD_TIMESTAMP support.
-- 
2.25.1



Re: [dpdk-dev] [PATCH v2] net/virtio: fix indirect descriptors reconnection

2021-10-14 Thread Maxime Coquelin




On 10/13/21 03:36, Xuan Ding wrote:

Add initialization for packed ring indirect descriptors
in reconnection path.

Fixes: 381f39ebb78a ("net/virtio: fix packed ring indirect descricptors setup")
Cc: sta...@dpdk.org
Cc: yong@intel.com

Signed-off-by: Xuan Ding 
Tested-by: Yinan Wang 
---

v2:
* Fix the position of some declarations.
---
  drivers/net/virtio/virtqueue.c | 14 ++
  1 file changed, 14 insertions(+)


Reviewed-by: Maxime Coquelin 

Thanks!
Maxime



Re: [dpdk-dev] [PATCH 08/32] net/ngbe: support basic statistics

2021-10-14 Thread Ferruh Yigit

On 10/14/2021 3:51 AM, Jiawen Wu wrote:

On September 16, 2021 12:51 AM, Ferruh Yigit wrote:

On 9/8/2021 9:37 AM, Jiawen Wu wrote:

Support to read and clear basic statistics, and configure per-queue
stats counter mapping.

Signed-off-by: Jiawen Wu 
---
  doc/guides/nics/features/ngbe.ini  |   2 +
  doc/guides/nics/ngbe.rst   |   1 +
  drivers/net/ngbe/base/ngbe_dummy.h |   5 +
  drivers/net/ngbe/base/ngbe_hw.c| 101 ++
  drivers/net/ngbe/base/ngbe_hw.h|   1 +
  drivers/net/ngbe/base/ngbe_type.h  | 134 +
  drivers/net/ngbe/ngbe_ethdev.c | 300

+

  drivers/net/ngbe/ngbe_ethdev.h |  19 ++
  8 files changed, 563 insertions(+)



<...>


+static int
+ngbe_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats
+*stats) {
+   struct ngbe_hw *hw = ngbe_dev_hw(dev);
+   struct ngbe_hw_stats *hw_stats = NGBE_DEV_STATS(dev);
+   struct ngbe_stat_mappings *stat_mappings =
+   NGBE_DEV_STAT_MAPPINGS(dev);
+   uint32_t i, j;
+
+   ngbe_read_stats_registers(hw, hw_stats);
+
+   if (stats == NULL)
+   return -EINVAL;
+
+   /* Fill out the rte_eth_stats statistics structure */
+   stats->ipackets = hw_stats->rx_packets;
+   stats->ibytes = hw_stats->rx_bytes;
+   stats->opackets = hw_stats->tx_packets;
+   stats->obytes = hw_stats->tx_bytes;
+
+   memset(&stats->q_ipackets, 0, sizeof(stats->q_ipackets));
+   memset(&stats->q_opackets, 0, sizeof(stats->q_opackets));
+   memset(&stats->q_ibytes, 0, sizeof(stats->q_ibytes));
+   memset(&stats->q_obytes, 0, sizeof(stats->q_obytes));
+   memset(&stats->q_errors, 0, sizeof(stats->q_errors));
+   for (i = 0; i < NGBE_MAX_QP; i++) {
+   uint32_t n = i / NB_QMAP_FIELDS_PER_QSM_REG;
+   uint32_t offset = (i % NB_QMAP_FIELDS_PER_QSM_REG) * 8;
+   uint32_t q_map;
+
+   q_map = (stat_mappings->rqsm[n] >> offset)
+   & QMAP_FIELD_RESERVED_BITS_MASK;
+   j = (q_map < RTE_ETHDEV_QUEUE_STAT_CNTRS
+? q_map : q_map % RTE_ETHDEV_QUEUE_STAT_CNTRS);
+   stats->q_ipackets[j] += hw_stats->qp[i].rx_qp_packets;
+   stats->q_ibytes[j] += hw_stats->qp[i].rx_qp_bytes;
+
+   q_map = (stat_mappings->tqsm[n] >> offset)
+   & QMAP_FIELD_RESERVED_BITS_MASK;
+   j = (q_map < RTE_ETHDEV_QUEUE_STAT_CNTRS
+? q_map : q_map % RTE_ETHDEV_QUEUE_STAT_CNTRS);
+   stats->q_opackets[j] += hw_stats->qp[i].tx_qp_packets;
+   stats->q_obytes[j] += hw_stats->qp[i].tx_qp_bytes;
+   }
+
+   /* Rx Errors */
+   stats->imissed  = hw_stats->rx_total_missed_packets +
+ hw_stats->rx_dma_drop;
+   stats->ierrors  = hw_stats->rx_crc_errors +
+ hw_stats->rx_mac_short_packet_dropped +
+ hw_stats->rx_length_errors +
+ hw_stats->rx_undersize_errors +
+ hw_stats->rx_oversize_errors +
+ hw_stats->rx_illegal_byte_errors +
+ hw_stats->rx_error_bytes +
+ hw_stats->rx_fragment_errors;
+
+   /* Tx Errors */
+   stats->oerrors  = 0;
+   return 0;


You can consider keeping 'stats->rx_nombuf' stats too, this needs to be
calculated by driver.



I see ' stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed' in the function 
rte_eth_stats_get, before calling stats_get ops.
Should I write it again here?



You are right, I missed it. Just updating 'rx_mbuf_alloc_failed' is
sufficient.


Re: [dpdk-dev] [PATCH v2 0/7] Removal of PCI bus ABIs

2021-10-14 Thread Xia, Chenbo
> -Original Message-
> From: Thomas Monjalon 
> Sent: Thursday, October 14, 2021 3:08 PM
> To: Harris, James R ; Walker, Benjamin
> ; Xia, Chenbo 
> Cc: Liu, Changpeng ; David Marchand
> ; dev@dpdk.org; Aaron Conole ;
> Zawadzki, Tomasz 
> Subject: Re: [dpdk-dev] [PATCH v2 0/7] Removal of PCI bus ABIs
> 
> 14/10/2021 09:00, Xia, Chenbo:
> > From: Thomas Monjalon 
> > > 14/10/2021 04:21, Xia, Chenbo:
> > > > From: Thomas Monjalon 
> > > > > Yes I think we need to agree on functions to keep as-is for
> compatibility.
> > > > > Waiting for your input please.
> > > >
> > > > So, do you mean currently DPDK doesn't guarantee ABI for drivers
> > >
> > > Yes
> > >
> > > > but could have driver ABI in the future?
> > >
> > > I don't think so, not general compatibility,
> > > but we can think about a way to avoid breaking SPDK specifically,
> > > which has less requirements.
> >
> > So the problem here is exposing some APIs to SPDK directly? Without the
> 'enable_driver_sdk'
> > option, I don't see a solution of both exposed and not-ABI. Any idea in your
> mind?
> 
> No the idea is to keep using enable_driver_sdk.
> But so far, there is no compatibility guarantee for driver SDK.
> The discussion is about which basic compatibility requirement is needed for
> SPDK.

Sorry for not understanding your point quickly, but what's the difference of
'general compatibility' and 'basic compatibility'? Because in my mind, one
struct or function should either be ABI-compatible or not. Could you help 
explain
it a bit?

Thanks,
Chenbo

> 
> 



Re: [dpdk-dev] [PATCH v1] vhost: add sanity check for resubmiting reqs in split ring

2021-10-14 Thread Maxime Coquelin

Hi Li,

Adding Jin Yu who introduced this function.

On 8/27/21 07:12, Li Feng wrote:

When getting reqs from the avail ring, the id may exceed inflight
queue size. Then the dpdk will crash forever.


You need to add Fixes tag and Cc sta...@dpdk.org so that it can be
backported.


Signed-off-by: Li Feng 
---
  lib/vhost/vhost_user.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 29a4c9af60..f09d0f6a48 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct virtio_net *dev,
last_io = inflight_split->last_inflight_io;
  
  	if (inflight_split->used_idx != used->idx) {

-   inflight_split->desc[last_io].inflight = 0;
-   rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+   if (unlikely(last_io >= inflight_split->desc_num)) {
+   VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"' exceeds 
inflight "
+   "queue size (%"PRIu16").\n", last_io,
+   inflight_split->desc_num);


If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR
instead of just logging an error?


+   } else {
+   inflight_split->desc[last_io].inflight = 0;
+   rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+   }
inflight_split->used_idx = used->idx;
}
  



Regards,
Maxime



Re: [dpdk-dev] [PATCH v1 1/1] ci: enable DPDK GHA for arm64 with self-hosted runners

2021-10-14 Thread Serena He
> 14/10/2021 06:20, Serena He:
> > From: Thomas Monjalon 
> > > 13/10/2021 10:03, Serena He:
> > > > CI jobs are triggered only for repos installed with given GHApp and
> runners
> > > [...]
> > > > +# Here, runners for arm64 are accessed by installed GitHub APP,
> thus
> > > will not be available by fork.
> > > > +# you can change the following 'if' and 'runs-on' if you have your
> own
> > > runners installed.
> > > > +# or request to get your repo on the whitelist to use GitHub APP 
> > > > and
> > > delete this 'if'.
> > > > +if: ${{ github.repository == 'DPDK/dpdk' || github.repository ==
> > > 'ovsrobot/dpdk' }}
> > >
> > > What is this "GitHub APP"?
> > >
> >
> > Apps on GitHub are integrations with the GitHub APIs.
> > This "GitHub APP" should be installed on repository
> 
> So GitHub has no native Arm support?

No, GitHub has no native Arm support.

> > and it will enable requests for arm-based runners.
> 
> Where it will run? Which servers?

Runners will run in AWS EC2 on Graviton2.

> > Sorry for not specifying this APP in the above comment.
> > It is to avoid unnecessary access from public.
> 
> You want to control who can run on these servers?
> There is no access control other than app installation?

Whitelist mentioned in the comment is for access control, but we hope the link 
is not shared with anyone who has no repo get whitelisted.

> > The installation link will be provided, as well as document, after release.
> 
> After release of the app? You mean it is not ready yet?

The APP has been ready. It's waiting for a release date.

> 
> In current state of assumptions, it is a nack.
>


Re: [dpdk-dev] [PATCH v5 1/2] build: add meson options of atomic_mbuf_ref_counts

2021-10-14 Thread Bruce Richardson
On Thu, Oct 14, 2021 at 04:54:18AM +0800, Kefu Chai wrote:
> RTE_MBUF_REFCNT_ATOMIC = 0 is not necessary for applications like
> Seastar, where it's safe to assume that the mbuf refcnt is only
> updated by a single core only.
> 
> Signed-off-by: Kefu Chai 
> ---

For this, I think it's a setting that needs to be a global one for DPDK, so
I'm ok with adding it as a meson option.

Acked-by: Bruce Richardson 


Re: [dpdk-dev] [PATCH v6 0/2] net: introduce IPv4 ihl and version fields

2021-10-14 Thread Ferruh Yigit

On 10/13/2021 6:13 PM, Gregory Etelson wrote:

Gregory Etelson (2):
   net: fix IPv4 change announce
   net: introduce IPv4 ihl and version fields



Hi Gregory,

Can you please change the order of the first and second patch?

This way I can get the first one, since it is already acked, before -rc1,
and continue reviews for second one, it will be OK since it is a
doc patch.

Thanks,
ferruh


Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-14 Thread Thomas Monjalon
13/10/2021 20:52, Thomas Monjalon:
> 13/10/2021 19:57, Harman Kalra:
> > From: dev  On Behalf Of Harman Kalra
> > > From: Thomas Monjalon 
> > > > 04/10/2021 11:57, David Marchand:
> > > > > On Mon, Oct 4, 2021 at 10:51 AM Harman Kalra 
> > > > wrote:
> > > > > > > > +struct rte_intr_handle *rte_intr_handle_instance_alloc(int 
> > > > > > > > size,
> > > > > > > > +  bool
> > > > > > > > +from_hugepage) {
> > > > > > > > +   struct rte_intr_handle *intr_handle;
> > > > > > > > +   int i;
> > > > > > > > +
> > > > > > > > +   if (from_hugepage)
> > > > > > > > +   intr_handle = rte_zmalloc(NULL,
> > > > > > > > + size * sizeof(struct 
> > > > > > > > rte_intr_handle),
> > > > > > > > + 0);
> > > > > > > > +   else
> > > > > > > > +   intr_handle = calloc(1, size * sizeof(struct
> > > > > > > > + rte_intr_handle));
> > > > > > >
> > > > > > > We can call DPDK allocator in all cases.
> > > > > > > That would avoid headaches on why multiprocess does not work in
> > > > > > > some rarely tested cases.
[...]
> > > > I agree with David.
> > > > I prefer a simpler API which always use rte_malloc, and make sure
> > > > interrupts are always handled between rte_eal_init and rte_eal_cleanup.
[...]
> > > There are couple of more dependencies on glibc heap APIs:
> > > 1. "rte_eal_alarm_init()" allocates an interrupt instance which is used 
> > > for
> > > timerfd, is called before "rte_eal_memory_init()" which does the memseg
> > > init.
> > > Not sure what all challenges we may face in moving alarm_init after
> > > memory_init as it might break some subsystem inits.
> > > Other option could be to allocate interrupt instance for timerfd on first
> > > alarm_setup call.
> 
> Indeed it is an issue.
> 
> [...]
> 
> > > There are many other drivers which statically declares the interrupt 
> > > handles
> > > inside their respective private structures and memory for those structure
> > > was allocated from heap. For such drivers I allocated interrupt instances 
> > > also
> > > using glibc heap APIs.
> 
> Could you use rte_malloc in these drivers?

If we take the direction of 2 different allocations mode for the interrupts,
I suggest we make it automatic without any API parameter.
We don't have any function to check rte_malloc readiness I think.
But we can detect whether shared memory is ready with this check:
rte_eal_get_configuration()->mem_config->magic == RTE_MAGIC
This check is true at the end of rte_eal_init, so it is false during probing.
Would it be enough? Or should we implement rte_malloc_is_ready()?




Re: [dpdk-dev] [PATCH v5 2/2] build: add meson options of max_memseg_lists

2021-10-14 Thread Bruce Richardson
On Thu, Oct 14, 2021 at 04:54:19AM +0800, Kefu Chai wrote:
> RTE_MAX_MEMSEG_LISTS = 128 is not enough for many-core machines, in our
> case, we need to increase it to 8192. so add an option so user can
> override it.
> 
> Signed-off-by: Kefu Chai 

This seems a very low-level option to be exposing to the user. Some
thoughts/questions:

- can you give some more detail on why you need such a massive number, 64
  times the default?
- what would be the impact of increasing the default to 8192? I assume this
  is only used in a few places in EAL, so would the memory footprint
  increase be large?
- rather than a single specified value, would an alternative be to make
  this be a computed value at config time, scaled by number of lcores (or
  number of numa nodes)?

Regards,
/Bruce


Re: [dpdk-dev] [PATCH v1] vhost: add sanity check for resubmiting reqs in split ring

2021-10-14 Thread Maxime Coquelin




On 10/14/21 10:17, Maxime Coquelin wrote:

Hi Li,

Adding Jin Yu who introduced this function.


Looks like Jin Yu has left Intel, Chenbo, could you find someone from
the Intel SPDK team to look at it?


On 8/27/21 07:12, Li Feng wrote:

When getting reqs from the avail ring, the id may exceed inflight
queue size. Then the dpdk will crash forever.


You need to add Fixes tag and Cc sta...@dpdk.org so that it can be
backported.


Signed-off-by: Li Feng 
---
  lib/vhost/vhost_user.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 29a4c9af60..f09d0f6a48 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct 
virtio_net *dev,

  last_io = inflight_split->last_inflight_io;
  if (inflight_split->used_idx != used->idx) {
-    inflight_split->desc[last_io].inflight = 0;
-    rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+    if (unlikely(last_io >= inflight_split->desc_num)) {
+    VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"' 
exceeds inflight "

+    "queue size (%"PRIu16").\n", last_io,
+    inflight_split->desc_num);


If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR
instead of just logging an error?


+    } else {
+    inflight_split->desc[last_io].inflight = 0;
+    rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+    }
  inflight_split->used_idx = used->idx;
  }



Regards,
Maxime




Re: [dpdk-dev] [PATCH v2 0/7] Removal of PCI bus ABIs

2021-10-14 Thread Thomas Monjalon
14/10/2021 10:07, Xia, Chenbo:
> From: Thomas Monjalon 
> > 14/10/2021 09:00, Xia, Chenbo:
> > > From: Thomas Monjalon 
> > > > 14/10/2021 04:21, Xia, Chenbo:
> > > > > From: Thomas Monjalon 
> > > > > > Yes I think we need to agree on functions to keep as-is for
> > compatibility.
> > > > > > Waiting for your input please.
> > > > >
> > > > > So, do you mean currently DPDK doesn't guarantee ABI for drivers
> > > >
> > > > Yes
> > > >
> > > > > but could have driver ABI in the future?
> > > >
> > > > I don't think so, not general compatibility,
> > > > but we can think about a way to avoid breaking SPDK specifically,
> > > > which has less requirements.
> > >
> > > So the problem here is exposing some APIs to SPDK directly? Without the
> > 'enable_driver_sdk'
> > > option, I don't see a solution of both exposed and not-ABI. Any idea in 
> > > your
> > mind?
> > 
> > No the idea is to keep using enable_driver_sdk.
> > But so far, there is no compatibility guarantee for driver SDK.
> > The discussion is about which basic compatibility requirement is needed for
> > SPDK.
> 
> Sorry for not understanding your point quickly, but what's the difference of
> 'general compatibility' and 'basic compatibility'? Because in my mind, one
> struct or function should either be ABI-compatible or not. Could you help 
> explain
> it a bit?

I wonder whether we could have a guarantee for a subset of structs and 
functions.
Anyway, this is just opening the discussion to collect some inputs first.
Then we'll have to check what is possible and get a techboard approval.




Re: [dpdk-dev] [PATCH v4] eventdev/rx_adapter: add telemetry callbacks

2021-10-14 Thread Jerin Jacob
On Wed, Oct 13, 2021 at 5:39 PM Naga Harish K, S V
 wrote:
>
> Acked-by: Naga Harish K S V 
>
> > -Original Message-
> > From: dev  On Behalf Of Ganapati Kundapura
> > Sent: Wednesday, October 13, 2021 1:27 PM
> > To: jerinjac...@gmail.com; dev@dpdk.org
> > Cc: Jayatheerthan, Jay 
> > Subject: [dpdk-dev] [PATCH v4] eventdev/rx_adapter: add telemetry
> > callbacks
> >
> > Added telemetry callbacks to get Rx adapter stats, reset stats and to get Rx
> > queue config information.
> >
> > Acked-by: Jay Jayatheerthan 
> >
> > Signed-off-by: Ganapati Kundapura 
> > ---
> > v4:
> > * Addressed segfault when per Rx queue event buffer is used.
> >
> > v3:
> > * Updated release notes.
> > * Addressed review comments.
> >
> > v2:
> > * Fixed checkpatch warning.
> > ---
> >
> > diff --git a/doc/guides/rel_notes/release_21_11.rst
> > b/doc/guides/rel_notes/release_21_11.rst
> > index dfc2cbd..9955e52 100644
> > --- a/doc/guides/rel_notes/release_21_11.rst
> > +++ b/doc/guides/rel_notes/release_21_11.rst
> > @@ -130,6 +130,10 @@ New Features
> >* Added tests to validate packets hard expiry.
> >* Added tests to verify tunnel header verification in IPsec inbound.
> >
> > +* **Updated rte_event_eth_rx_adapter_stats structure
> > +  * Added 'uint64_t rx_event_buf_count'
> > +  * Added 'uint64_t rx_event_buf_size'
> > +

We need to add this to ABI changed. Updated the patch to add ABI changes
as below

* eventdev: New variables ``rx_event_buf_count`` and ``rx_event_buf_size``
  were added in structure ``rte_event_eth_rx_adapter_stats`` to get additional
  status.

And Updatde git log as log below:

eventdev/rx_adapter: support telemetry

Added telemetry callbacks to get Rx adapter stats, reset stats and
to get Rx queue config information.

Signed-off-by: Ganapati Kundapura 
Acked-by: Jay Jayatheerthan 
Acked-by: Naga Harish K S V 
Acked-by: Jerin Jacob 


Applied to dpdk-next-net-eventdev/for-main. Thanks


Re: [dpdk-dev] [PATCH v1] vhost: add sanity check for resubmiting reqs in split ring

2021-10-14 Thread Xia, Chenbo
Hi Changpeng,

> -Original Message-
> From: Maxime Coquelin 
> Sent: Thursday, October 14, 2021 4:26 PM
> To: Li Feng ; Xia, Chenbo 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v1] vhost: add sanity check for resubmiting reqs in split
> ring
> 
> 
> 
> On 10/14/21 10:17, Maxime Coquelin wrote:
> > Hi Li,
> >
> > Adding Jin Yu who introduced this function.
> 
> Looks like Jin Yu has left Intel, Chenbo, could you find someone from
> the Intel SPDK team to look at it?

Could you or your team member help check this?

Thanks,
Chenbo

> 
> > On 8/27/21 07:12, Li Feng wrote:
> >> When getting reqs from the avail ring, the id may exceed inflight
> >> queue size. Then the dpdk will crash forever.
> >
> > You need to add Fixes tag and Cc sta...@dpdk.org so that it can be
> > backported.
> >
> >> Signed-off-by: Li Feng 
> >> ---
> >>   lib/vhost/vhost_user.c | 10 --
> >>   1 file changed, 8 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> >> index 29a4c9af60..f09d0f6a48 100644
> >> --- a/lib/vhost/vhost_user.c
> >> +++ b/lib/vhost/vhost_user.c
> >> @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct
> >> virtio_net *dev,
> >>   last_io = inflight_split->last_inflight_io;
> >>   if (inflight_split->used_idx != used->idx) {
> >> -    inflight_split->desc[last_io].inflight = 0;
> >> -    rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> >> +    if (unlikely(last_io >= inflight_split->desc_num)) {
> >> +    VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"'
> >> exceeds inflight "
> >> +    "queue size (%"PRIu16").\n", last_io,
> >> +    inflight_split->desc_num);
> >
> > If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR
> > instead of just logging an error?
> >
> >> +    } else {
> >> +    inflight_split->desc[last_io].inflight = 0;
> >> +    rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> >> +    }
> >>   inflight_split->used_idx = used->idx;
> >>   }
> >>
> >
> > Regards,
> > Maxime



Re: [dpdk-dev] [PATCH v1 1/1] ci: enable DPDK GHA for arm64 with self-hosted runners

2021-10-14 Thread Thomas Monjalon
14/10/2021 10:18, Serena He:
> > 14/10/2021 06:20, Serena He:
> > > From: Thomas Monjalon 
> > > > 13/10/2021 10:03, Serena He:
> > > > > CI jobs are triggered only for repos installed with given GHApp and
> > runners
> > > > [...]
> > > > > +# Here, runners for arm64 are accessed by installed GitHub APP,
> > thus
> > > > will not be available by fork.
> > > > > +# you can change the following 'if' and 'runs-on' if you have 
> > > > > your
> > own
> > > > runners installed.
> > > > > +# or request to get your repo on the whitelist to use GitHub APP 
> > > > > and
> > > > delete this 'if'.
> > > > > +if: ${{ github.repository == 'DPDK/dpdk' || github.repository ==
> > > > 'ovsrobot/dpdk' }}
> > > >
> > > > What is this "GitHub APP"?
> > > >
> > >
> > > Apps on GitHub are integrations with the GitHub APIs.
> > > This "GitHub APP" should be installed on repository
> > 
> > So GitHub has no native Arm support?
> 
> No, GitHub has no native Arm support.
> 
> > > and it will enable requests for arm-based runners.
> > 
> > Where it will run? Which servers?
> 
> Runners will run in AWS EC2 on Graviton2.
> 
> > > Sorry for not specifying this APP in the above comment.
> > > It is to avoid unnecessary access from public.
> > 
> > You want to control who can run on these servers?
> > There is no access control other than app installation?
> 
> Whitelist mentioned in the comment is for access control, but we hope the 
> link is not shared with anyone who has no repo get whitelisted.

You mean the link is enough to run on your AWS instance?
There is no key control when running?
It looks really weak, and I don't want to merge anything in DPDK related
to some secret app.

> > > The installation link will be provided, as well as document, after 
> > > release.
> > 
> > After release of the app? You mean it is not ready yet?
> 
> The APP has been ready. It's waiting for a release date.

> > In current state of assumptions, it is a nack.
So I confirm the nack for now.





Re: [dpdk-dev] [PATCH v5 2/2] build: add meson options of max_memseg_lists

2021-10-14 Thread Thomas Monjalon
14/10/2021 10:25, Bruce Richardson:
> On Thu, Oct 14, 2021 at 04:54:19AM +0800, Kefu Chai wrote:
> > RTE_MAX_MEMSEG_LISTS = 128 is not enough for many-core machines, in our
> > case, we need to increase it to 8192. so add an option so user can
> > override it.
> > 
> > Signed-off-by: Kefu Chai 
> 
> This seems a very low-level option to be exposing to the user. Some
> thoughts/questions:
> 
> - can you give some more detail on why you need such a massive number, 64
>   times the default?
> - what would be the impact of increasing the default to 8192? I assume this
>   is only used in a few places in EAL, so would the memory footprint
>   increase be large?
> - rather than a single specified value, would an alternative be to make
>   this be a computed value at config time, scaled by number of lcores (or
>   number of numa nodes)?

+1 for these suggestions




Re: [dpdk-dev] [PATCH v6 0/2] net: introduce IPv4 ihl and version fields

2021-10-14 Thread Thomas Monjalon
14/10/2021 10:21, Ferruh Yigit:
> On 10/13/2021 6:13 PM, Gregory Etelson wrote:
> > Gregory Etelson (2):
> >net: fix IPv4 change announce
> >net: introduce IPv4 ihl and version fields
> > 
> 
> Hi Gregory,
> 
> Can you please change the order of the first and second patch?
> 
> This way I can get the first one, since it is already acked, before -rc1,
> and continue reviews for second one, it will be OK since it is a
> doc patch.

It makes more sense in this order I think.
The first patch is just dropping a deprecation note, I can ack.




Re: [dpdk-dev] [PATCH v2] test/hash: fix buffer overflow

2021-10-14 Thread David Marchand
On Wed, Oct 13, 2021 at 9:28 PM Vladimir Medvedkin
 wrote:
>
> This patch fixes buffer overflow reported by ASAN,
> please reference https://bugs.dpdk.org/show_bug.cgi?id=818
>
> Some tests for the rte_hash table use the rte_jhash_32b() as
> the hash function. This hash function interprets the length
> argument in units of 4 bytes.
>
> This patch divides configured key length by 4 in cases when
> rte_jhash_32b() is used.
>
> For some tests rte_jhash() is used with keys of length not
> a multiple of 4 bytes. From the rte_jhash() documentation:
> If input key is not aligned to four byte boundaries or a
> multiple of four bytes in length, the memory region just
> after may be read (but not used in the computation).
>
> This patch increases the size of the proto field of the
> flow_key struct up to uint32_t and sets the alignment to 4 bytes.
>
> Bugzilla ID: 818
> Fixes: af75078fece3 ("first public release")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Vladimir Medvedkin 
> ---
>  app/test/test_hash.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/app/test/test_hash.c b/app/test/test_hash.c
> index bd4d0cb..e3f2d29 100644
> --- a/app/test/test_hash.c
> +++ b/app/test/test_hash.c
> @@ -80,8 +80,8 @@ struct flow_key {
> uint32_t ip_dst;
> uint16_t port_src;
> uint16_t port_dst;
> -   uint8_t proto;
> -} __rte_packed;
> +   uint32_t proto;
> +} __rte_packed __rte_aligned(sizeof(uint32_t));

If in the future, we add a field not multiple of sizeof(uint32_t),
there will be some padding at the end of the structure.
I *think* holes and padding content is undefined for initialized
objects (though maybe things could be different with objects in .data
?).
That's probably something to confirm.
If this is the case, the hash function would consider random data.

I think growing the proto field to uint32_t like you did is the right
fix since the whole structure is now naturally uint32_t aligned.

But I would remove the aligned attribute and prefer
RTE_BUILD_BUG(sizeof(struct flow_key) % sizeof(sizeof(uint32_t)) !=
0).
Maybe add a comment to explain we keep the packed attribute to avoid
holes with potentially undefined content in the middle of this struct.


-- 
David Marchand



Re: [dpdk-dev] [PATCH v6 1/2] net: fix IPv4 change announce

2021-10-14 Thread Thomas Monjalon
13/10/2021 19:13, Gregory Etelson:
> IPv4 header encodes fragment information into 16 bits field.
> 3 bits hold flags and remaining 13 bits are for fragment offset.
> 13 bits bit-field cannot be defined both for big and little endian
> systems.
> 
> The patch removes IPv4 fragments union announce.
> 
> Fixes: f7383e7c7ec1 ("net: announce changes in IPv4 header access")
> 
> Signed-off-by: Gregory Etelson 

OK to drop this announce.
There is no implementation anyway,
it will be back in one year if there is a solution.

Acked-by: Thomas Monjalon 




[dpdk-dev] [PATCH 0/5] ethdev: cosmetic fixes for just moved structures

2021-10-14 Thread Andrew Rybchenko
Sicne rte_eth_dev and rte_eth_dev_data structures are just moved
right now is a good chance to make a cleanup.

No strong opinion, but I think it would be useful for the future.

Make be at least some fixes from below could be accepted.

Andrew Rybchenko (5):
  ethdev: avoid documentation in next lines
  ethdev: fix Rx/Tx spelling in just moved structures
  ethdev: remove reserved fields from internal structures
  ethdev: make device and data structures readable
  ethdev: remove full stop after short comments and references

 lib/ethdev/ethdev_driver.h | 124 +++--
 1 file changed, 64 insertions(+), 60 deletions(-)

-- 
2.30.2



[dpdk-dev] [PATCH 1/5] ethdev: avoid documentation in next lines

2021-10-14 Thread Andrew Rybchenko
Documentation in the next separate line is confusing. If documentation
requires own line it should be before, not after.

Fixes: 9f3eb8826450 ("ethdev: hide eth dev related structures")

Signed-off-by: Andrew Rybchenko 
---
 lib/ethdev/ethdev_driver.h | 72 ++
 1 file changed, 35 insertions(+), 37 deletions(-)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 0174ba03d7..e5c7d08160 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -44,18 +44,17 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+   /** Pointer to PMD transmit prepare function. */
eth_tx_prep_t tx_pkt_prepare;
-   /**< Pointer to PMD transmit prepare function. */
+   /** Get the number of used RX descriptors. */
eth_rx_queue_count_t rx_queue_count;
-   /**< Get the number of used RX descriptors. */
+   /** Check the status of a Rx descriptor. */
eth_rx_descriptor_status_t rx_descriptor_status;
-   /**< Check the status of a Rx descriptor. */
+   /** Check the status of a Tx descriptor. */
eth_tx_descriptor_status_t tx_descriptor_status;
-   /**< Check the status of a Tx descriptor. */
 
/**
-* points to device data that is shared between
-* primary and secondary processes.
+* Device data that is shared between primary and secondary processes.
 */
struct rte_eth_dev_data *data;
void *process_private; /**< Pointer to per-process device data. */
@@ -100,64 +99,63 @@ struct rte_eth_dev_data {
 
struct rte_eth_dev_sriov sriov;/**< SRIOV data */
 
+   /** PMD-specific private data. @see rte_eth_dev_release_port(). */
void *dev_private;
-   /**< PMD-specific private data.
-*   @see rte_eth_dev_release_port()
-*/
 
struct rte_eth_link dev_link;   /**< Link-level information & status. */
struct rte_eth_conf dev_conf;   /**< Configuration applied to device. */
uint16_t mtu;   /**< Maximum Transmission Unit. */
+   /** Common RX buffer size handled by all queues. */
uint32_t min_rx_buf_size;
-   /**< Common RX buffer size handled by all queues. */
 
uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation failures. */
+   /** Device Ethernet link address. @see rte_eth_dev_release_port(). */
struct rte_ether_addr *mac_addrs;
-   /**< Device Ethernet link address.
-*   @see rte_eth_dev_release_port()
-*/
+   /** Bitmap associating MAC addresses to pools. */
uint64_t mac_pool_sel[ETH_NUM_RECEIVE_MAC_ADDR];
-   /**< Bitmap associating MAC addresses to pools. */
+   /**
+* Device Ethernet MAC addresses of hash filtering.
+* @see rte_eth_dev_release_port()
+*/
struct rte_ether_addr *hash_mac_addrs;
-   /**< Device Ethernet MAC addresses of hash filtering.
-*   @see rte_eth_dev_release_port()
-*/
uint16_t port_id;   /**< Device [external] port identifier. */
 
__extension__
-   uint8_t promiscuous   : 1,
-   /**< RX promiscuous mode ON(1) / OFF(0). */
+   uint8_t /** RX promiscuous mode ON(1) / OFF(0). */
+   promiscuous   : 1,
+   /** RX of scattered packets is ON(1) / OFF(0) */
scattered_rx : 1,
-   /**< RX of scattered packets is ON(1) / OFF(0) */
+   /** RX all multicast mode ON(1) / OFF(0). */
all_multicast : 1,
-   /**< RX all multicast mode ON(1) / OFF(0). */
+   /** Device state: STARTED(1) / STOPPED(0). */
dev_started : 1,
-   /**< Device state: STARTED(1) / STOPPED(0). */
+   /** RX LRO is ON(1) / OFF(0) */
lro : 1,
-   /**< RX LRO is ON(1) / OFF(0) */
-   dev_configured : 1;
-   /**< Indicates whether the device is configured.
-*   CONFIGURED(1) / NOT CONFIGURED(0).
+   /**
+* Indicates whether the device is configured.
+* CONFIGURED(1) / NOT CONFIGURED(0).
 */
+   dev_configured : 1;
+   /** Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-   /**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
+   /** Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-   /**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0)

[dpdk-dev] [PATCH 2/5] ethdev: fix Rx/Tx spelling in just moved structures

2021-10-14 Thread Andrew Rybchenko
Fixes: 9f3eb8826450 ("ethdev: hide eth dev related structures")

Signed-off-by: Andrew Rybchenko 
---
 lib/ethdev/ethdev_driver.h | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index e5c7d08160..af9f379692 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -46,7 +46,7 @@ struct rte_eth_dev {
eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
/** Pointer to PMD transmit prepare function. */
eth_tx_prep_t tx_pkt_prepare;
-   /** Get the number of used RX descriptors. */
+   /** Get the number of used Rx descriptors. */
eth_rx_queue_count_t rx_queue_count;
/** Check the status of a Rx descriptor. */
eth_rx_descriptor_status_t rx_descriptor_status;
@@ -92,10 +92,10 @@ struct rte_eth_dev_owner;
 struct rte_eth_dev_data {
char name[RTE_ETH_NAME_MAX_LEN]; /**< Unique identifier name */
 
-   void **rx_queues; /**< Array of pointers to RX queues. */
-   void **tx_queues; /**< Array of pointers to TX queues. */
-   uint16_t nb_rx_queues; /**< Number of RX queues. */
-   uint16_t nb_tx_queues; /**< Number of TX queues. */
+   void **rx_queues; /**< Array of pointers to Rx queues. */
+   void **tx_queues; /**< Array of pointers to Tx queues. */
+   uint16_t nb_rx_queues; /**< Number of Rx queues. */
+   uint16_t nb_tx_queues; /**< Number of Tx queues. */
 
struct rte_eth_dev_sriov sriov;/**< SRIOV data */
 
@@ -105,10 +105,10 @@ struct rte_eth_dev_data {
struct rte_eth_link dev_link;   /**< Link-level information & status. */
struct rte_eth_conf dev_conf;   /**< Configuration applied to device. */
uint16_t mtu;   /**< Maximum Transmission Unit. */
-   /** Common RX buffer size handled by all queues. */
+   /** Common Rx buffer size handled by all queues. */
uint32_t min_rx_buf_size;
 
-   uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation failures. */
+   uint64_t rx_mbuf_alloc_failed; /**< Rx ring mbuf allocation failures. */
/** Device Ethernet link address. @see rte_eth_dev_release_port(). */
struct rte_ether_addr *mac_addrs;
/** Bitmap associating MAC addresses to pools. */
@@ -121,15 +121,15 @@ struct rte_eth_dev_data {
uint16_t port_id;   /**< Device [external] port identifier. */
 
__extension__
-   uint8_t /** RX promiscuous mode ON(1) / OFF(0). */
+   uint8_t /** Rx promiscuous mode ON(1) / OFF(0). */
promiscuous   : 1,
-   /** RX of scattered packets is ON(1) / OFF(0) */
+   /** Rx of scattered packets is ON(1) / OFF(0) */
scattered_rx : 1,
-   /** RX all multicast mode ON(1) / OFF(0). */
+   /** Rx all multicast mode ON(1) / OFF(0). */
all_multicast : 1,
/** Device state: STARTED(1) / STOPPED(0). */
dev_started : 1,
-   /** RX LRO is ON(1) / OFF(0) */
+   /** Rx LRO is ON(1) / OFF(0) */
lro : 1,
/**
 * Indicates whether the device is configured.
-- 
2.30.2



[dpdk-dev] [PATCH 3/5] ethdev: remove reserved fields from internal structures

2021-10-14 Thread Andrew Rybchenko
Fixes: 9f3eb8826450 ("ethdev: hide eth dev related structures")

Signed-off-by: Andrew Rybchenko 
---
 lib/ethdev/ethdev_driver.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index af9f379692..80d5784166 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -75,9 +75,6 @@ struct rte_eth_dev {
struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
enum rte_eth_dev_state state; /**< Flag indicating the port state */
void *security_ctx; /**< Context for security ops */
-
-   uint64_t reserved_64s[4]; /**< Reserved for future fields */
-   void *reserved_ptrs[4];   /**< Reserved for future fields */
 } __rte_cache_aligned;
 
 struct rte_eth_dev_sriov;
@@ -158,8 +155,6 @@ struct rte_eth_dev_data {
uint16_t backer_port_id;
 
pthread_mutex_t flow_ops_mutex; /**< rte_flow ops mutex. */
-   uint64_t reserved_64s[4]; /**< Reserved for future fields */
-   void *reserved_ptrs[4];   /**< Reserved for future fields */
 } __rte_cache_aligned;
 
 /**
-- 
2.30.2



[dpdk-dev] [PATCH 4/5] ethdev: make device and data structures readable

2021-10-14 Thread Andrew Rybchenko
Add empty lines to separate fields commented using different styles.

Fixes: 9f3eb8826450 ("ethdev: hide eth dev related structures")

Signed-off-by: Andrew Rybchenko 
---
 lib/ethdev/ethdev_driver.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 80d5784166..0dd5dc6f61 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -44,6 +44,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+
/** Pointer to PMD transmit prepare function. */
eth_tx_prep_t tx_pkt_prepare;
/** Get the number of used Rx descriptors. */
@@ -61,6 +62,7 @@ struct rte_eth_dev {
const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
struct rte_device *device; /**< Backing device */
struct rte_intr_handle *intr_handle; /**< Device interrupt handle */
+
/** User application callbacks for NIC interrupts */
struct rte_eth_dev_cb_list link_intr_cbs;
/**
@@ -73,6 +75,7 @@ struct rte_eth_dev {
 * received packets before passing them to the driver for transmission.
 */
struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
+
enum rte_eth_dev_state state; /**< Flag indicating the port state */
void *security_ctx; /**< Context for security ops */
 } __rte_cache_aligned;
@@ -102,10 +105,12 @@ struct rte_eth_dev_data {
struct rte_eth_link dev_link;   /**< Link-level information & status. */
struct rte_eth_conf dev_conf;   /**< Configuration applied to device. */
uint16_t mtu;   /**< Maximum Transmission Unit. */
+
/** Common Rx buffer size handled by all queues. */
uint32_t min_rx_buf_size;
 
uint64_t rx_mbuf_alloc_failed; /**< Rx ring mbuf allocation failures. */
+
/** Device Ethernet link address. @see rte_eth_dev_release_port(). */
struct rte_ether_addr *mac_addrs;
/** Bitmap associating MAC addresses to pools. */
@@ -115,6 +120,7 @@ struct rte_eth_dev_data {
 * @see rte_eth_dev_release_port()
 */
struct rte_ether_addr *hash_mac_addrs;
+
uint16_t port_id;   /**< Device [external] port identifier. */
 
__extension__
@@ -133,15 +139,20 @@ struct rte_eth_dev_data {
 * CONFIGURED(1) / NOT CONFIGURED(0).
 */
dev_configured : 1;
+
/** Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
/** Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
+
uint32_t dev_flags; /**< Capabilities. */
int numa_node;  /**< NUMA node connection. */
+
/** VLAN filter configuration. */
struct rte_vlan_filter_conf vlan_filter_conf;
+
struct rte_eth_dev_owner owner; /**< The port owner. */
+
/**
 * Switch-specific identifier.
 * Valid if RTE_ETH_DEV_REPRESENTOR in dev_flags.
-- 
2.30.2



[dpdk-dev] [PATCH 5/5] ethdev: remove full stop after short comments and references

2021-10-14 Thread Andrew Rybchenko
Full stop at the end of short comment just make line longer. It
should be either everywhere or nowhere to be consistent.

Fixes: 9f3eb8826450 ("ethdev: hide eth dev related structures")

Signed-off-by: Andrew Rybchenko 
---
 lib/ethdev/ethdev_driver.h | 68 +++---
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 0dd5dc6f61..756b2ba3f9 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -42,23 +42,23 @@ struct rte_eth_rxtx_callback {
  * process, while the actual configuration data for the device is shared.
  */
 struct rte_eth_dev {
-   eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
-   eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+   eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function */
+   eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function */
 
-   /** Pointer to PMD transmit prepare function. */
+   /** Pointer to PMD transmit prepare function */
eth_tx_prep_t tx_pkt_prepare;
-   /** Get the number of used Rx descriptors. */
+   /** Get the number of used Rx descriptors */
eth_rx_queue_count_t rx_queue_count;
-   /** Check the status of a Rx descriptor. */
+   /** Check the status of a Rx descriptor */
eth_rx_descriptor_status_t rx_descriptor_status;
-   /** Check the status of a Tx descriptor. */
+   /** Check the status of a Tx descriptor */
eth_tx_descriptor_status_t tx_descriptor_status;
 
/**
-* Device data that is shared between primary and secondary processes.
+* Device data that is shared between primary and secondary processes
 */
struct rte_eth_dev_data *data;
-   void *process_private; /**< Pointer to per-process device data. */
+   void *process_private; /**< Pointer to per-process device data */
const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
struct rte_device *device; /**< Backing device */
struct rte_intr_handle *intr_handle; /**< Device interrupt handle */
@@ -72,7 +72,7 @@ struct rte_eth_dev {
struct rte_eth_rxtx_callback 
*post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
/**
 * User-supplied functions called from tx_burst to pre-process
-* received packets before passing them to the driver for transmission.
+* received packets before passing them to the driver for transmission
 */
struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
 
@@ -92,28 +92,28 @@ struct rte_eth_dev_owner;
 struct rte_eth_dev_data {
char name[RTE_ETH_NAME_MAX_LEN]; /**< Unique identifier name */
 
-   void **rx_queues; /**< Array of pointers to Rx queues. */
-   void **tx_queues; /**< Array of pointers to Tx queues. */
-   uint16_t nb_rx_queues; /**< Number of Rx queues. */
-   uint16_t nb_tx_queues; /**< Number of Tx queues. */
+   void **rx_queues; /**< Array of pointers to Rx queues */
+   void **tx_queues; /**< Array of pointers to Tx queues */
+   uint16_t nb_rx_queues; /**< Number of Rx queues */
+   uint16_t nb_tx_queues; /**< Number of Tx queues */
 
struct rte_eth_dev_sriov sriov;/**< SRIOV data */
 
-   /** PMD-specific private data. @see rte_eth_dev_release_port(). */
+   /** PMD-specific private data. @see rte_eth_dev_release_port() */
void *dev_private;
 
-   struct rte_eth_link dev_link;   /**< Link-level information & status. */
-   struct rte_eth_conf dev_conf;   /**< Configuration applied to device. */
-   uint16_t mtu;   /**< Maximum Transmission Unit. */
+   struct rte_eth_link dev_link;   /**< Link-level information & status */
+   struct rte_eth_conf dev_conf;   /**< Configuration applied to device */
+   uint16_t mtu;   /**< Maximum Transmission Unit */
 
-   /** Common Rx buffer size handled by all queues. */
+   /** Common Rx buffer size handled by all queues */
uint32_t min_rx_buf_size;
 
-   uint64_t rx_mbuf_alloc_failed; /**< Rx ring mbuf allocation failures. */
+   uint64_t rx_mbuf_alloc_failed; /**< Rx ring mbuf allocation failures */
 
-   /** Device Ethernet link address. @see rte_eth_dev_release_port(). */
+   /** Device Ethernet link address. @see rte_eth_dev_release_port() */
struct rte_ether_addr *mac_addrs;
-   /** Bitmap associating MAC addresses to pools. */
+   /** Bitmap associating MAC addresses to pools */
uint64_t mac_pool_sel[ETH_NUM_RECEIVE_MAC_ADDR];
/**
 * Device Ethernet MAC addresses of hash filtering.
@@ -121,37 +121,37 @@ struct rte_eth_dev_data {
 */
struct rte_ether_addr *hash_mac_addrs;
 
-   uint16_t port_id;   /**< Device [external] port identifier. */
+   uint16_t port_id;   /

Re: [dpdk-dev] DPDK driver autoloading?

2021-10-14 Thread Bruce Richardson
On Wed, Oct 13, 2021 at 09:52:18AM -0700, Stephen Hemminger wrote:
> The current DPDK PCI code requires that all PMD shared libraries
> be loaded before probing.  This a burden for applications that run
> on multiple platforms and a total mess for Linux distributions.
> 
> A better way would be to have the bus scanning code autoload
> drivers as needed. This would work like the Linux kernel
> module loading. 
> 
> Could existing pmdinfogen mechanism be extended to to do this?

Can you clarify a bit more about what the problem is here? How does EAL
loading the modules first make things a mess for distributions, compared to
loading them on-demand? With both schemes you naturally need to ensure all
needed drivers are present, and can remove any unneeded drivers. The only
difference is that with our current scheme some unneeded drivers which are
present will be loaded at runtime but idle/unused - hardly a massive
impact, I would think. What am I missing here?

/Bruce


[dpdk-dev] [PATCH] net/mlx5: close tools socket with the last device

2021-10-14 Thread Dmitry Kozlyuk
MLX5 PMD exposes a socket for external tools to dump port state.
Socket events are listened using an interrupt source of EXT type.
The socket was closed and the interrupt callback was unregistered
at program exit, which is incorrect because DPDK could be already
shut down at this point. Move actions performed at program exit
to the moment the last MLX5 port is closed. The socket will be opened
again if later a new MLX5 device is plugged in and probed.
Also fix comments that were deceisively talking
about secondary processes instead of external tools.

Reported-by: Harman Kalra 
Signed-off-by: Dmitry Kozlyuk 
Acked-by: Thomas Monjalon 
---
 drivers/net/mlx5/linux/mlx5_os.c |  9 +
 drivers/net/mlx5/linux/mlx5_socket.c | 12 +++-
 drivers/net/mlx5/mlx5.c  |  6 --
 drivers/net/mlx5/mlx5.h  |  2 ++
 drivers/net/mlx5/windows/mlx5_os.c   |  8 
 5 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 3746057673..28db0827d5 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2793,6 +2793,15 @@ mlx5_os_net_probe(struct rte_device *dev)
return mlx5_os_auxiliary_probe(dev);
 }
 
+/**
+ * Cleanup resources when the last device is closed.
+ */
+void
+mlx5_os_net_cleanup(void)
+{
+   mlx5_pmd_socket_uninit();
+}
+
 static int
 mlx5_config_doorbell_mapping_env(const struct mlx5_dev_config *config)
 {
diff --git a/drivers/net/mlx5/linux/mlx5_socket.c 
b/drivers/net/mlx5/linux/mlx5_socket.c
index 6356b66dc4..902b8ec934 100644
--- a/drivers/net/mlx5/linux/mlx5_socket.c
+++ b/drivers/net/mlx5/linux/mlx5_socket.c
@@ -167,10 +167,7 @@ mlx5_pmd_interrupt_handler_uninstall(void)
 }
 
 /**
- * Initialise the socket to communicate with the secondary process
- *
- * @param[in] dev
- *   Pointer to Ethernet device.
+ * Initialise the socket to communicate with external tools.
  *
  * @return
  *   0 on success, a negative value otherwise.
@@ -187,10 +184,6 @@ mlx5_pmd_socket_init(void)
MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
if (server_socket)
return 0;
-   /*
-* Initialize the socket to communicate with the secondary
-* process.
-*/
ret = socket(AF_UNIX, SOCK_STREAM, 0);
if (ret < 0) {
DRV_LOG(WARNING, "Failed to open mlx5 socket: %s",
@@ -237,7 +230,8 @@ mlx5_pmd_socket_init(void)
 /**
  * Un-Initialize the pmd socket
  */
-RTE_FINI(mlx5_pmd_socket_uninit)
+void
+mlx5_pmd_socket_uninit(void)
 {
if (!server_socket)
return;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 45ccfe2784..497cf52787 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1315,9 +1315,11 @@ mlx5_free_shared_dev_ctx(struct mlx5_dev_ctx_shared *sh)
mlx5_mr_release_cache(&sh->share_cache);
/* Remove context from the global device list. */
LIST_REMOVE(sh, next);
-   /* Release flow workspaces objects on the last device. */
-   if (LIST_EMPTY(&mlx5_dev_ctx_list))
+   /* Release resources on the last device removal. */
+   if (LIST_EMPTY(&mlx5_dev_ctx_list)) {
+   mlx5_os_net_cleanup();
mlx5_flow_os_release_workspace();
+   }
pthread_mutex_unlock(&mlx5_dev_ctx_list_mutex);
/*
 *  Ensure there is no async event handler installed.
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3581414b78..08eeddaddc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1739,6 +1739,7 @@ int mlx5_mp_os_req_queue_control(struct rte_eth_dev *dev, 
uint16_t queue_id,
 /* mlx5_socket.c */
 
 int mlx5_pmd_socket_init(void);
+void mlx5_pmd_socket_uninit(void);
 
 /* mlx5_flow_meter.c */
 
@@ -1787,6 +1788,7 @@ int mlx5_os_set_promisc(struct rte_eth_dev *dev, int 
enable);
 int mlx5_os_set_allmulti(struct rte_eth_dev *dev, int enable);
 int mlx5_os_set_nonblock_channel_fd(int fd);
 void mlx5_os_mac_addr_flush(struct rte_eth_dev *dev);
+void mlx5_os_net_cleanup(void);
 
 /* mlx5_txpp.c */
 
diff --git a/drivers/net/mlx5/windows/mlx5_os.c 
b/drivers/net/mlx5/windows/mlx5_os.c
index 26fa927039..bcfb26c57f 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -1185,4 +1185,12 @@ mlx5_os_get_pdn(void *pd, uint32_t *pdn)
return 0;
 }
 
+/**
+ * Cleanup resources when the last device is closed.
+ */
+void
+mlx5_os_net_cleanup(void)
+{
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops = {0};
-- 
2.25.1



Re: [dpdk-dev] [RFC 01/14] vhost: move async data in a dedicated structure

2021-10-14 Thread Maxime Coquelin




On 10/14/21 05:24, Hu, Jiayu wrote:

Hi Maxime,


-Original Message-
From: Maxime Coquelin 
Sent: Friday, October 8, 2021 6:00 AM
To: dev@dpdk.org; Xia, Chenbo ; Hu, Jiayu
; Wang, YuanX ; Ma,
WenwuX ; Richardson, Bruce
; Mcnamara, John

Cc: Maxime Coquelin 
Subject: [RFC 01/14] vhost: move async data in a dedicated structure

This patch moves async-related metadata from vhost_virtqueue to a
dedicated struct. It makes it clear which fields are async related, and also
saves some memory when async feature is not in use.

Signed-off-by: Maxime Coquelin 
---
  lib/vhost/vhost.c  | 129 -
  lib/vhost/vhost.h  |  53 -
  lib/vhost/vhost_user.c |   4 +-
  lib/vhost/virtio_net.c | 114 +++-
  4 files changed, 140 insertions(+), 160 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index
9540522dac..58f72b633c 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -340,19 +340,15 @@ cleanup_device(struct virtio_net *dev, int destroy)
static void  vhost_free_async_mem(struct vhost_virtqueue *vq)  {
-   rte_free(vq->async_pkts_info);
+   rte_free(vq->async->pkts_info);


Apps may unregister async in vring_state_changed() explicitly when
vring is disabled. In this case, rte_vhost_async_channel_unregister()
will call vhost_free_async_mem() first, so that vq->async becomes NULL.
But after then device is destroyed, free_vq() calls vhost_free_async_mem()
again. "rte_free(vq->async->pkts_info)" will try to read a NULL pointer,
which will cause segment fault.


Good catch, I'll add a check of vq->async before dereferencing it.



-   rte_free(vq->async_buffers_packed);
-   vq->async_buffers_packed = NULL;
-   rte_free(vq->async_descs_split);
-   vq->async_descs_split = NULL;
+   rte_free(vq->async->buffers_packed);
+   vq->async->buffers_packed = NULL;
+   rte_free(vq->async->descs_split);
+   vq->async->descs_split = NULL;

-   rte_free(vq->it_pool);
-   rte_free(vq->vec_pool);
-
-   vq->async_pkts_info = NULL;
-   vq->it_pool = NULL;
-   vq->vec_pool = NULL;
+   rte_free(vq->async);
+   vq->async = NULL;
  }

  void
@@ -1629,77 +1625,63 @@ async_channel_register(int vid, uint16_t
queue_id,  {
struct virtio_net *dev = get_device(vid);
struct vhost_virtqueue *vq = dev->virtqueue[queue_id];
+   struct vhost_async *async;
+   int node = vq->numa_node;

-   if (unlikely(vq->async_registered)) {
+   if (unlikely(vq->async)) {
VHOST_LOG_CONFIG(ERR,
-   "async register failed: channel already registered "
-   "(vid %d, qid: %d)\n", vid, queue_id);
+   "async register failed: already registered
(vid %d, qid: %d)\n",
+   vid, queue_id);
return -1;
}

-   vq->async_pkts_info = rte_malloc_socket(NULL,
-   vq->size * sizeof(struct async_inflight_info),
-   RTE_CACHE_LINE_SIZE, vq->numa_node);
-   if (!vq->async_pkts_info) {
-   vhost_free_async_mem(vq);
-   VHOST_LOG_CONFIG(ERR,
-   "async register failed: cannot allocate memory for
async_pkts_info "
-   "(vid %d, qid: %d)\n", vid, queue_id);
+   async = rte_zmalloc_socket(NULL, sizeof(struct vhost_async), 0,
node);
+   if (!async) {
+   VHOST_LOG_CONFIG(ERR, "failed to allocate async metadata
(vid %d, qid: %d)\n",
+   vid, queue_id);
return -1;
}

-   vq->it_pool = rte_malloc_socket(NULL,
-   VHOST_MAX_ASYNC_IT * sizeof(struct
rte_vhost_iov_iter),
-   RTE_CACHE_LINE_SIZE, vq->numa_node);
-   if (!vq->it_pool) {
-   vhost_free_async_mem(vq);
-   VHOST_LOG_CONFIG(ERR,
-   "async register failed: cannot allocate memory for
it_pool "
-   "(vid %d, qid: %d)\n", vid, queue_id);
-   return -1;
-   }
-
-   vq->vec_pool = rte_malloc_socket(NULL,
-   VHOST_MAX_ASYNC_VEC * sizeof(struct iovec),
-   RTE_CACHE_LINE_SIZE, vq->numa_node);
-   if (!vq->vec_pool) {
-   vhost_free_async_mem(vq);
-   VHOST_LOG_CONFIG(ERR,
-   "async register failed: cannot allocate memory for
vec_pool "
-   "(vid %d, qid: %d)\n", vid, queue_id);
-   return -1;
+   async->pkts_info = rte_malloc_socket(NULL, vq->size * sizeof(struct
async_inflight_info),
+   RTE_CACHE_LINE_SIZE, node);
+   if (async->pkts_info) {


It should be "if (!async->pkts_info)".


Correct.

You can notice I didn't lie when I said it was only compile-tested in
the cover-letter! :)

Regards,
Maxime


Thanks,
Jiayu





[dpdk-dev] [PATCH v2] net/mlx5: close tools socket with the last device

2021-10-14 Thread Dmitry Kozlyuk
MLX5 PMD exposes a socket for external tools to dump port state.
Socket events are listened using an interrupt source of EXT type.
The socket was closed and the interrupt callback was unregistered
at program exit, which is incorrect because DPDK could be already
shut down at this point. Move actions performed at program exit
to the moment the last MLX5 port is closed. The socket will be opened
again if later a new MLX5 device is plugged in and probed.
Also fix comments that were deceisively talking
about secondary processes instead of external tools.

Fixes: e6cdc54cc0ef ("net/mlx5: add socket server for external tools")
Cc: Xueming Li 
Cc: sta...@dpdk.org

Reported-by: Harman Kalra 
Signed-off-by: Dmitry Kozlyuk 
Acked-by: Thomas Monjalon 
---
v2: add Fixes tag

 drivers/net/mlx5/linux/mlx5_os.c |  9 +
 drivers/net/mlx5/linux/mlx5_socket.c | 12 +++-
 drivers/net/mlx5/mlx5.c  |  6 --
 drivers/net/mlx5/mlx5.h  |  2 ++
 drivers/net/mlx5/windows/mlx5_os.c   |  8 
 5 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 3746057673..28db0827d5 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2793,6 +2793,15 @@ mlx5_os_net_probe(struct rte_device *dev)
return mlx5_os_auxiliary_probe(dev);
 }
 
+/**
+ * Cleanup resources when the last device is closed.
+ */
+void
+mlx5_os_net_cleanup(void)
+{
+   mlx5_pmd_socket_uninit();
+}
+
 static int
 mlx5_config_doorbell_mapping_env(const struct mlx5_dev_config *config)
 {
diff --git a/drivers/net/mlx5/linux/mlx5_socket.c 
b/drivers/net/mlx5/linux/mlx5_socket.c
index 6356b66dc4..902b8ec934 100644
--- a/drivers/net/mlx5/linux/mlx5_socket.c
+++ b/drivers/net/mlx5/linux/mlx5_socket.c
@@ -167,10 +167,7 @@ mlx5_pmd_interrupt_handler_uninstall(void)
 }
 
 /**
- * Initialise the socket to communicate with the secondary process
- *
- * @param[in] dev
- *   Pointer to Ethernet device.
+ * Initialise the socket to communicate with external tools.
  *
  * @return
  *   0 on success, a negative value otherwise.
@@ -187,10 +184,6 @@ mlx5_pmd_socket_init(void)
MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
if (server_socket)
return 0;
-   /*
-* Initialize the socket to communicate with the secondary
-* process.
-*/
ret = socket(AF_UNIX, SOCK_STREAM, 0);
if (ret < 0) {
DRV_LOG(WARNING, "Failed to open mlx5 socket: %s",
@@ -237,7 +230,8 @@ mlx5_pmd_socket_init(void)
 /**
  * Un-Initialize the pmd socket
  */
-RTE_FINI(mlx5_pmd_socket_uninit)
+void
+mlx5_pmd_socket_uninit(void)
 {
if (!server_socket)
return;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 45ccfe2784..497cf52787 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1315,9 +1315,11 @@ mlx5_free_shared_dev_ctx(struct mlx5_dev_ctx_shared *sh)
mlx5_mr_release_cache(&sh->share_cache);
/* Remove context from the global device list. */
LIST_REMOVE(sh, next);
-   /* Release flow workspaces objects on the last device. */
-   if (LIST_EMPTY(&mlx5_dev_ctx_list))
+   /* Release resources on the last device removal. */
+   if (LIST_EMPTY(&mlx5_dev_ctx_list)) {
+   mlx5_os_net_cleanup();
mlx5_flow_os_release_workspace();
+   }
pthread_mutex_unlock(&mlx5_dev_ctx_list_mutex);
/*
 *  Ensure there is no async event handler installed.
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3581414b78..08eeddaddc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1739,6 +1739,7 @@ int mlx5_mp_os_req_queue_control(struct rte_eth_dev *dev, 
uint16_t queue_id,
 /* mlx5_socket.c */
 
 int mlx5_pmd_socket_init(void);
+void mlx5_pmd_socket_uninit(void);
 
 /* mlx5_flow_meter.c */
 
@@ -1787,6 +1788,7 @@ int mlx5_os_set_promisc(struct rte_eth_dev *dev, int 
enable);
 int mlx5_os_set_allmulti(struct rte_eth_dev *dev, int enable);
 int mlx5_os_set_nonblock_channel_fd(int fd);
 void mlx5_os_mac_addr_flush(struct rte_eth_dev *dev);
+void mlx5_os_net_cleanup(void);
 
 /* mlx5_txpp.c */
 
diff --git a/drivers/net/mlx5/windows/mlx5_os.c 
b/drivers/net/mlx5/windows/mlx5_os.c
index 26fa927039..bcfb26c57f 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -1185,4 +1185,12 @@ mlx5_os_get_pdn(void *pd, uint32_t *pdn)
return 0;
 }
 
+/**
+ * Cleanup resources when the last device is closed.
+ */
+void
+mlx5_os_net_cleanup(void)
+{
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops = {0};
-- 
2.25.1



[dpdk-dev] [PATCH v6] app/testpmd: add option to display extended statistics

2021-10-14 Thread Andrew Rybchenko
From: Ivan Ilchenko 

Add 'display-xstats' option for using in accompanying with Rx/Tx statistics
(i.e. 'stats-period' option or 'show port stats' interactive command) to
display specified list of extended statistics.

Signed-off-by: Ivan Ilchenko 
Signed-off-by: Andrew Rybchenko 
Acked-by: Ajit Khaparde 
---
v6:
- process more review notes from Ferruh

v5:
- process review notes from Ferruh

v4:
- split from patch series
- move xstats information to rte_port structure to avoid
  extra global structure

 app/test-pmd/config.c |  62 ++
 app/test-pmd/parameters.c |  81 ++
 app/test-pmd/testpmd.c| 116 ++
 app/test-pmd/testpmd.h|  15 
 doc/guides/testpmd_app_ug/run_app.rst |   6 ++
 5 files changed, 280 insertions(+)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 7221644230..7b1605b362 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -175,6 +175,65 @@ print_ethaddr(const char *name, struct rte_ether_addr 
*eth_addr)
printf("%s%s", name, buf);
 }
 
+static void
+nic_xstats_display_periodic(portid_t port_id)
+{
+   struct xstat_display_info *xstats_info;
+   uint64_t *prev_values, *curr_values;
+   uint64_t diff_value, value_rate;
+   struct timespec cur_time;
+   uint64_t *ids_supp;
+   size_t ids_supp_sz;
+   uint64_t diff_ns;
+   unsigned int i;
+   int rc;
+
+   xstats_info = &ports[port_id].xstats_info;
+
+   ids_supp_sz = xstats_info->ids_supp_sz;
+   if (ids_supp_sz == 0)
+   return;
+
+   printf("\n");
+
+   ids_supp = xstats_info->ids_supp;
+   prev_values = xstats_info->prev_values;
+   curr_values = xstats_info->curr_values;
+
+   rc = rte_eth_xstats_get_by_id(port_id, ids_supp, curr_values,
+ ids_supp_sz);
+   if (rc != (int)ids_supp_sz) {
+   fprintf(stderr,
+   "Failed to get values of %zu xstats for port %u - 
return code %d\n",
+   ids_supp_sz, port_id, rc);
+   return;
+   }
+
+   diff_ns = 0;
+   if (clock_gettime(CLOCK_TYPE_ID, &cur_time) == 0) {
+   uint64_t ns;
+
+   ns = cur_time.tv_sec * NS_PER_SEC;
+   ns += cur_time.tv_nsec;
+
+   if (xstats_info->prev_ns != 0)
+   diff_ns = ns - xstats_info->prev_ns;
+   xstats_info->prev_ns = ns;
+   }
+
+   printf("%-31s%-17s%s\n", " ", "Value", "Rate (since last show)");
+   for (i = 0; i < ids_supp_sz; i++) {
+   diff_value = (curr_values[i] > prev_values[i]) ?
+(curr_values[i] - prev_values[i]) : 0;
+   prev_values[i] = curr_values[i];
+   value_rate = diff_ns > 0 ?
+   (double)diff_value / diff_ns * NS_PER_SEC : 0;
+
+   printf("  %-25s%12"PRIu64" %15"PRIu64"\n",
+  xstats_display[i].name, curr_values[i], value_rate);
+   }
+}
+
 void
 nic_stats_display(portid_t port_id)
 {
@@ -245,6 +304,9 @@ nic_stats_display(portid_t port_id)
   PRIu64"  Tx-bps: %12"PRIu64"\n", mpps_rx, mbps_rx * 8,
   mpps_tx, mbps_tx * 8);
 
+   if (xstats_display_num > 0)
+   nic_xstats_display_periodic(port_id);
+
printf("  %s%s\n",
   nic_stats_border, nic_stats_border);
 }
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 3f94a82e32..b3217d6e5c 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -61,6 +61,10 @@ usage(char* progname)
   "(only if interactive is disabled).\n");
printf("  --stats-period=PERIOD: statistics will be shown "
   "every PERIOD seconds (only if interactive is disabled).\n");
+   printf("  --display-xstats xstat_name1[,...]: comma-separated list of "
+  "extended statistics to show. Used with --stats-period "
+  "specified or interactive commands that show Rx/Tx statistics "
+  "(i.e. 'show port stats').\n");
printf("  --nb-cores=N: set the number of forwarding cores "
   "(1 <= N <= %d).\n", nb_lcores);
printf("  --nb-ports=N: set the number of forwarding ports "
@@ -473,6 +477,72 @@ parse_event_printing_config(const char *optarg, int enable)
return 0;
 }
 
+static int
+parse_xstats_list(const char *in_str, struct rte_eth_xstat_name **xstats,
+ unsigned int *xstats_num)
+{
+   int max_names_nb, names_nb, nonempty_names_nb;
+   int name, nonempty_name;
+   int stringlen;
+   char **names;
+   char *str;
+   int ret;
+   int i;
+
+   names = NULL;
+   str = strdup(in_str);
+   if (str == NULL) {
+   ret = -ENOMEM;
+   goto out;
+   }
+

Re: [dpdk-dev] [PATCH v3 01/14] eventdev: make driver interface as internal

2021-10-14 Thread Jerin Jacob
On Wed, Oct 6, 2021 at 12:21 PM  wrote:
>
> From: Pavan Nikhilesh 
>
> Mark all the driver specific functions as internal, remove
> `rte` prefix from `struct rte_eventdev_ops`.
> Remove experimental tag from internal functions.
> Remove `eventdev_pmd.h` from non-internal header files.
>
> Signed-off-by: Pavan Nikhilesh 
> ---
>  v3 Changes:
>  - Reset fp_ops when device is torndown.
>  - Add `event_dev_probing_finish()` this function is used for
>post-initialization processing. In current usecase we use it to
>initialize fastpath ops.
>
>  v2 Changes:
>  - Rework inline flat array by adding port data into it.
>  - Rearrange rte_event_timer elements.


There is rebase issue with next-evendev. Please rebase

[for-main]dell[dpdk-next-eventdev] $ git pw series apply 19405

Applying: eventdev: make driver interface as internal
Using index info to reconstruct a base tree...
M   drivers/event/cnxk/cn10k_eventdev.c
M   drivers/event/cnxk/cn9k_eventdev.c
M   lib/eventdev/eventdev_pmd.h
M   lib/eventdev/rte_event_crypto_adapter.h
M   lib/eventdev/version.map
Falling back to patching base and 3-way merge...
Auto-merging lib/eventdev/version.map
CONFLICT (content): Merge conflict in lib/eventdev/version.map
Auto-merging lib/eventdev/rte_event_crypto_adapter.h
Auto-merging lib/eventdev/eventdev_pmd.h
Auto-merging drivers/event/cnxk/cn9k_eventdev.c
CONFLICT (content): Merge conflict in drivers/event/cnxk/cn9k_eventdev.c
Auto-merging drivers/event/cnxk/cn10k_eventdev.c
CONFLICT (content): Merge conflict in drivers/event/cnxk/cn10k_eventdev.c
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 eventdev: make driver interface as internal
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Re: [dpdk-dev] [PATCH v3 01/14] eventdev: make driver interface as internal

2021-10-14 Thread Jerin Jacob
On Wed, Oct 6, 2021 at 12:21 PM  wrote:
>
> From: Pavan Nikhilesh 
>
> Mark all the driver specific functions as internal, remove
> `rte` prefix from `struct rte_eventdev_ops`.
> Remove experimental tag from internal functions.
> Remove `eventdev_pmd.h` from non-internal header files.
>
> Signed-off-by: Pavan Nikhilesh 
> ---
>  v3 Changes:
>  - Reset fp_ops when device is torndown.
>  - Add `event_dev_probing_finish()` this function is used for
>post-initialization processing. In current usecase we use it to
>initialize fastpath ops.
>
>  v2 Changes:
>  - Rework inline flat array by adding port data into it.
>  - Rearrange rte_event_timer elements.
>
>  drivers/event/cnxk/cn10k_eventdev.c|  6 ++---
>  drivers/event/cnxk/cn9k_eventdev.c | 10 -
>  drivers/event/dlb2/dlb2.c  |  2 +-
>  drivers/event/dpaa/dpaa_eventdev.c |  2 +-
>  drivers/event/dpaa2/dpaa2_eventdev.c   |  2 +-
>  drivers/event/dsw/dsw_evdev.c  |  2 +-
>  drivers/event/octeontx/ssovf_evdev.c   |  2 +-
>  drivers/event/octeontx/ssovf_worker.c  |  4 ++--
>  drivers/event/octeontx2/otx2_evdev.c   | 26 +++---
>  drivers/event/opdl/opdl_evdev.c|  2 +-
>  drivers/event/skeleton/skeleton_eventdev.c |  2 +-
>  drivers/event/sw/sw_evdev.c|  2 +-
>  lib/eventdev/eventdev_pmd.h|  6 -
>  lib/eventdev/eventdev_pmd_pci.h|  4 +++-
>  lib/eventdev/eventdev_pmd_vdev.h   |  2 ++
>  lib/eventdev/meson.build   |  6 +
>  lib/eventdev/rte_event_crypto_adapter.h|  1 -
>  lib/eventdev/rte_eventdev.h| 25 -
>  lib/eventdev/version.map   | 17 +++--

Please update the release notes for API and ABI changes,


Re: [dpdk-dev] [RFC 02/15] eventdev: separate internal structures

2021-10-14 Thread Jerin Jacob
On Tue, Aug 24, 2021 at 1:10 AM  wrote:
>
> From: Pavan Nikhilesh 
>
> Create rte_eventdev_core.h and move all the internal data structures
> to this file. These structures are mostly used by drivers, but they
> need to be in the public header file as they are accessed by datapath
> inline functions for performance reasons.
> The accessibility of these data structures is not changed.
>
> Signed-off-by: Pavan Nikhilesh 
> ---
>  lib/eventdev/eventdev_pmd.h  |   3 -
>  lib/eventdev/meson.build |   3 +
>  lib/eventdev/rte_eventdev.h  | 715 +--
>  lib/eventdev/rte_eventdev_core.h | 144 +++
>  4 files changed, 443 insertions(+), 422 deletions(-)
>  create mode 100644 lib/eventdev/rte_eventdev_core.h

Please validate the Doxygen output.


>
> diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h
> index 5dab9e2f70..a25d3f1fb5 100644
> --- a/lib/eventdev/eventdev_pmd.h
> +++ b/lib/eventdev/eventdev_pmd.h
> @@ -91,9 +91,6 @@ struct rte_eventdev_global {
> uint8_t nb_devs;/**< Number of devices found */
>  };
>
> -extern struct rte_eventdev *rte_eventdevs;
> -/** The pool of rte_eventdev structures. */
> -
>  /**
>   * Get the rte_eventdev structure device pointer for the named device.
>   *
> diff --git a/lib/eventdev/meson.build b/lib/eventdev/meson.build
> index 523ea9ccae..8b51fde361 100644
> --- a/lib/eventdev/meson.build
> +++ b/lib/eventdev/meson.build
> @@ -27,6 +27,9 @@ headers = files(
>  'rte_event_crypto_adapter.h',
>  'rte_event_eth_tx_adapter.h',
>  )
> +indirect_headers += files(
> +'rte_eventdev_core.h',
> +)
>  driver_sdk_headers += files(
>  'eventdev_pmd.h',
>  'eventdev_pmd_pci.h',
> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> index 6ba116002f..1b11d4576d 100644
> --- a/lib/eventdev/rte_eventdev.h
> +++ b/lib/eventdev/rte_eventdev.h
> @@ -1324,314 +1324,6 @@ int
>  rte_event_eth_tx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
> uint32_t *caps);
>
> -struct eventdev_ops;
> -struct rte_eventdev;
> -
> -typedef uint16_t (*event_enqueue_t)(void *port, const struct rte_event *ev);
> -/**< @internal Enqueue event on port of a device */
> -
> -typedef uint16_t (*event_enqueue_burst_t)(void *port,
> -   const struct rte_event ev[], uint16_t nb_events);
> -/**< @internal Enqueue burst of events on port of a device */
> -
> -typedef uint16_t (*event_dequeue_t)(void *port, struct rte_event *ev,
> -   uint64_t timeout_ticks);
> -/**< @internal Dequeue event from port of a device */
> -
> -typedef uint16_t (*event_dequeue_burst_t)(void *port, struct rte_event ev[],
> -   uint16_t nb_events, uint64_t timeout_ticks);
> -/**< @internal Dequeue burst of events from port of a device */
> -
> -typedef uint16_t (*event_tx_adapter_enqueue)(void *port,
> -   struct rte_event ev[], uint16_t nb_events);
> -/**< @internal Enqueue burst of events on port of a device */
> -
> -typedef uint16_t (*event_tx_adapter_enqueue_same_dest)(void *port,
> -   struct rte_event ev[], uint16_t nb_events);
> -/**< @internal Enqueue burst of events on port of a device supporting
> - * burst having same destination Ethernet port & Tx queue.
> - */
> -
> -typedef uint16_t (*event_crypto_adapter_enqueue)(void *port,
> -   struct rte_event ev[], uint16_t nb_events);
> -/**< @internal Enqueue burst of events on crypto adapter */
> -
> -#define RTE_EVENTDEV_NAME_MAX_LEN  (64)
> -/**< @internal Max length of name of event PMD */
> -
> -/**
> - * @internal
> - * The data part, with no function pointers, associated with each device.
> - *
> - * This structure is safe to place in shared memory to be common among
> - * different processes in a multi-process configuration.
> - */
> -struct rte_eventdev_data {
> -   int socket_id;
> -   /**< Socket ID where memory is allocated */
> -   uint8_t dev_id;
> -   /**< Device ID for this instance */
> -   uint8_t nb_queues;
> -   /**< Number of event queues. */
> -   uint8_t nb_ports;
> -   /**< Number of event ports. */
> -   void **ports;
> -   /**< Array of pointers to ports. */
> -   struct rte_event_port_conf *ports_cfg;
> -   /**< Array of port configuration structures. */
> -   struct rte_event_queue_conf *queues_cfg;
> -   /**< Array of queue configuration structures. */
> -   uint16_t *links_map;
> -   /**< Memory to store queues to port connections. */
> -   void *dev_private;
> -   /**< PMD-specific private data */
> -   uint32_t event_dev_cap;
> -   /**< Event device capabilities(RTE_EVENT_DEV_CAP_)*/
> -   struct rte_event_dev_config dev_conf;
> -   /**< Configuration applied to device. */
> -   uint8_t service_inited;
> -   /* Service initialization state */
> -   uint32_t service_id;
> - 

Re: [dpdk-dev] [PATCH v3 04/14] eventdev: move inline APIs into separate structure

2021-10-14 Thread Jerin Jacob
On Wed, Oct 6, 2021 at 12:21 PM  wrote:
>
> From: Pavan Nikhilesh 
>
> Move fastpath inline function pointers from rte_eventdev into a
> separate structure accessed via a flat array.
> The intension is to make rte_eventdev and related structures private

intention

> to avoid future API/ABI breakages.`
>
> Signed-off-by: Pavan Nikhilesh 
> Acked-by: Ray Kinsella 
> ---
>  lib/eventdev/eventdev_pmd.h  |  38 +++
>  lib/eventdev/eventdev_pmd_pci.h  |   4 +-
>  lib/eventdev/eventdev_private.c  | 112 +++
>  lib/eventdev/meson.build |   1 +
>  lib/eventdev/rte_eventdev.c  |  22 +-
>  lib/eventdev/rte_eventdev_core.h |  28 
>  lib/eventdev/version.map |   6 ++
>  7 files changed, 209 insertions(+), 2 deletions(-)
>  create mode 100644 lib/eventdev/eventdev_private.c
>
 sources = files(
> +'eventdev_private.c',
>  'rte_eventdev.c',
>  'rte_event_ring.c',
>  'eventdev_trace_points.c',

Since you are reworking, please sort this in alphabetical order.


>
> +struct rte_event_fp_ops {
> +   event_enqueue_t enqueue;
> +   /**< PMD enqueue function. */
> +   event_enqueue_burst_t enqueue_burst;
> +   /**< PMD enqueue burst function. */
> +   event_enqueue_burst_t enqueue_new_burst;
> +   /**< PMD enqueue burst new function. */
> +   event_enqueue_burst_t enqueue_forward_burst;
> +   /**< PMD enqueue burst fwd function. */
> +   event_dequeue_t dequeue;
> +   /**< PMD dequeue function. */
> +   event_dequeue_burst_t dequeue_burst;
> +   /**< PMD dequeue burst function. */
> +   event_tx_adapter_enqueue_t txa_enqueue;
> +   /**< PMD Tx adapter enqueue function. */
> +   event_tx_adapter_enqueue_t txa_enqueue_same_dest;
> +   /**< PMD Tx adapter enqueue same destination function. */
> +   event_crypto_adapter_enqueue_t ca_enqueue;
> +   /**< PMD Crypto adapter enqueue function. */
> +   uintptr_t reserved[2];
> +
> +   void **data;

Since access to data is a must for all ops, Please move that to first.
Also, you can merge reserved and reserved2 in that case.


> +   /**< points to array of internal port data pointers */
> +   uintptr_t reserved2[4];
> +} __rte_cache_aligned;
> +
> +extern struct rte_event_fp_ops rte_event_fp_ops[RTE_EVENT_MAX_DEVS];
> +
>  #define RTE_EVENTDEV_NAME_MAX_LEN (64)
>  /**< @internal Max length of name of event PMD */
>
> diff --git a/lib/eventdev/version.map b/lib/eventdev/version.map
> index 5f1fe412a4..a3a732089b 100644
> --- a/lib/eventdev/version.map
> +++ b/lib/eventdev/version.map
> @@ -85,6 +85,9 @@ DPDK_22 {
> rte_event_timer_cancel_burst;
> rte_eventdevs;
>
> +   #added in 21.11
> +   rte_event_fp_ops;
> +
> local: *;
>  };
>
> @@ -141,6 +144,9 @@ EXPERIMENTAL {
>  INTERNAL {
> global:
>
> +   event_dev_fp_ops_reset;
> +   event_dev_fp_ops_set;
> +   event_dev_probing_finish;
> rte_event_pmd_selftest_seqn_dynfield_offset;
> rte_event_pmd_allocate;
> rte_event_pmd_get_named_dev;
> --
> 2.17.1
>


Re: [dpdk-dev] [PATCH v3 05/14] drivers/event: invoke probing finish function

2021-10-14 Thread Jerin Jacob
On Wed, Oct 6, 2021 at 12:21 PM  wrote:
>
> From: Pavan Nikhilesh 
>
> Invoke event_dev_probing_finish() functions at the end of probing,

functions -> function

> this function sets the function pointers in the fp_ops flat array.
>
> Signed-off-by: Pavan Nikhilesh 
> ---


Re: [dpdk-dev] [PATCH v3 14/14] eventdev: mark trace variables as internal

2021-10-14 Thread Jerin Jacob
On Wed, Oct 6, 2021 at 12:41 PM David Marchand
 wrote:
>
> Hello Pavan, Ray,
>
> On Wed, Oct 6, 2021 at 8:52 AM  wrote:
> >
> > From: Pavan Nikhilesh 
> >
> > Mark rte_trace global variables as internal i.e. remove them
> > from experimental section of version map.
> > Some of them are used in inline APIs, mark those as global.
> >
> > Signed-off-by: Pavan Nikhilesh 
> > Acked-by: Ray Kinsella 
>
> Please, sort those symbols.
> I check with ./devtools/update-abi.sh $(cat ABI_VERSION)
>
>
> > ---
> >  lib/eventdev/version.map | 77 ++--
> >  1 file changed, 35 insertions(+), 42 deletions(-)
> >
> > diff --git a/lib/eventdev/version.map b/lib/eventdev/version.map
> > index 068d186c66..617fff0ae6 100644
> > --- a/lib/eventdev/version.map
> > +++ b/lib/eventdev/version.map
> > @@ -88,57 +88,19 @@ DPDK_22 {
> > rte_event_vector_pool_create;
> > rte_eventdevs;
> >
> > -   #added in 21.11
> > -   rte_event_fp_ops;
> > -
> > -   local: *;
> > -};
> > -
> > -EXPERIMENTAL {
> > -   global:
> > -
> > # added in 20.05
>
> At the next ABI bump, ./devtools/update-abi.sh will strip those
> comments from the stable section.
> You can notice this when you run ./devtools/update-abi.sh $CURRENT_ABI
> as suggested above.

Please do the David suggestion on sorting the map file.

> I would strip the comments now that the symbols are going to stable.
> Ray, do you have an opinion?
>
>
> --
> David Marchand
>


Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-14 Thread Harman Kalra
> -Original Message-
> From: Thomas Monjalon 
> Sent: Thursday, October 14, 2021 1:53 PM
> To: Harman Kalra 
> Cc: dev@dpdk.org; Raslan Darawsheh ; Ray Kinsella
> ; Dmitry Kozlyuk ; David
> Marchand ; viachesl...@nvidia.com;
> ma...@nvidia.com
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement
> get set APIs
> 
> 13/10/2021 20:52, Thomas Monjalon:
> > 13/10/2021 19:57, Harman Kalra:
> > > From: dev  On Behalf Of Harman Kalra
> > > > From: Thomas Monjalon 
> > > > > 04/10/2021 11:57, David Marchand:
> > > > > > On Mon, Oct 4, 2021 at 10:51 AM Harman Kalra
> > > > > > 
> > > > > wrote:
> > > > > > > > > +struct rte_intr_handle *rte_intr_handle_instance_alloc(int
> size,
> > > > > > > > > +
> > > > > > > > > +bool
> > > > > > > > > +from_hugepage) {
> > > > > > > > > +   struct rte_intr_handle *intr_handle;
> > > > > > > > > +   int i;
> > > > > > > > > +
> > > > > > > > > +   if (from_hugepage)
> > > > > > > > > +   intr_handle = rte_zmalloc(NULL,
> > > > > > > > > + size * 
> > > > > > > > > sizeof(struct rte_intr_handle),
> > > > > > > > > + 0);
> > > > > > > > > +   else
> > > > > > > > > +   intr_handle = calloc(1, size *
> > > > > > > > > + sizeof(struct rte_intr_handle));
> > > > > > > >
> > > > > > > > We can call DPDK allocator in all cases.
> > > > > > > > That would avoid headaches on why multiprocess does not
> > > > > > > > work in some rarely tested cases.
> [...]
> > > > > I agree with David.
> > > > > I prefer a simpler API which always use rte_malloc, and make
> > > > > sure interrupts are always handled between rte_eal_init and
> rte_eal_cleanup.
> [...]
> > > > There are couple of more dependencies on glibc heap APIs:
> > > > 1. "rte_eal_alarm_init()" allocates an interrupt instance which is
> > > > used for timerfd, is called before "rte_eal_memory_init()" which
> > > > does the memseg init.
> > > > Not sure what all challenges we may face in moving alarm_init
> > > > after memory_init as it might break some subsystem inits.
> > > > Other option could be to allocate interrupt instance for timerfd
> > > > on first alarm_setup call.
> >
> > Indeed it is an issue.
> >
> > [...]
> >
> > > > There are many other drivers which statically declares the
> > > > interrupt handles inside their respective private structures and
> > > > memory for those structure was allocated from heap. For such
> > > > drivers I allocated interrupt instances also using glibc heap APIs.
> >
> > Could you use rte_malloc in these drivers?
> 
> If we take the direction of 2 different allocations mode for the interrupts, I
> suggest we make it automatic without any API parameter.
> We don't have any function to check rte_malloc readiness I think.
> But we can detect whether shared memory is ready with this check:
> rte_eal_get_configuration()->mem_config->magic == RTE_MAGIC This check
> is true at the end of rte_eal_init, so it is false during probing.
> Would it be enough? Or should we implement rte_malloc_is_ready()?

Hi Thomas,

It's a very good suggestion. Let's implement "rte_malloc_is_ready()" which 
could be as
simple as " rte_eal_get_configuration()->mem_config->magic == RTE_MAGIC" check.
There may be more consumers for this API in future.

If we are making it automatic detection, shall we now even have argument to 
this alloc API?
I added a flags argument (32 bit) in latest series where each bit of this flag 
can be an allocation capability.
I used two bits for discriminating between glibc malloc and rte_malloc. Shall 
we keep it or drop it?

David, Dmitry please share your thoughts.

Thanks
Harman


> 



Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-14 Thread David Marchand
On Thu, Oct 14, 2021 at 11:31 AM Harman Kalra  wrote:
> If we are making it automatic detection, shall we now even have argument to 
> this alloc API?
> I added a flags argument (32 bit) in latest series where each bit of this 
> flag can be an allocation capability.
> I used two bits for discriminating between glibc malloc and rte_malloc. Shall 
> we keep it or drop it?
>
> David, Dmitry please share your thoughts.

I don't have ideas of how we would extend allocations of such object,
so I am unsure.

In doubt, I would keep this flags field, and validate it's always 0
(as mentioned in my reply on ABI).


-- 
David Marchand



Re: [dpdk-dev] [PATCH v8 2/4] telemetry: fix socket path conflicts for in-memory mode

2021-10-14 Thread Kevin Traynor

On 12/10/2021 17:39, Bruce Richardson wrote:

When running using in-memory mode, multiple processes can use the same
runtime dir, leading to conflicts with the telemetry sockets in that
directory. We can resolve this by appending a suffix to each socket
beyond the first, with the suffix being an increasing counter value.
Each process uses the first unused socket counter value.

Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")

Reported-by: David Marchand 
Signed-off-by: Bruce Richardson 
Acked-by: Ciara Power 


Thanks Bruce, lgtm. Nice solution to keep existing name where possible 
and to give as predictable as you can names for others.


Acked-by: Kevin Traynor 


---
  lib/telemetry/telemetry.c | 65 +--
  1 file changed, 49 insertions(+), 16 deletions(-)

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index 48f4c7ba46..a7483167d4 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -457,28 +457,45 @@ create_socket(char *path)
  
  	struct sockaddr_un sun = {.sun_family = AF_UNIX};

strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
-   unlink(sun.sun_path);
+   TMTY_LOG(DEBUG, "Attempting socket bind to path '%s'\n", path);
+
if (bind(sock, (void *) &sun, sizeof(sun)) < 0) {
struct stat st;
  
-		TMTY_LOG(ERR, "Error binding socket: %s\n", strerror(errno));

-   if (stat(socket_dir, &st) < 0 || !S_ISDIR(st.st_mode))
+   TMTY_LOG(DEBUG, "Initial bind to socket '%s' failed.\n", path);
+
+   /* first check if we have a runtime dir */
+   if (stat(socket_dir, &st) < 0 || !S_ISDIR(st.st_mode)) {
TMTY_LOG(ERR, "Cannot access DPDK runtime directory: 
%s\n", socket_dir);
-   sun.sun_path[0] = 0;
-   goto error;
+   close(sock);
+   return -ENOENT;
+   }
+
+   /* check if current socket is active */
+   if (connect(sock, (void *)&sun, sizeof(sun)) == 0) {
+   close(sock);
+   return -EADDRINUSE;
+   }
+
+   /* socket is not active, delete and attempt rebind */
+   TMTY_LOG(DEBUG, "Attempting unlink and retrying bind\n");
+   unlink(sun.sun_path);
+   if (bind(sock, (void *) &sun, sizeof(sun)) < 0) {
+   TMTY_LOG(ERR, "Error binding socket: %s\n", 
strerror(errno));
+   close(sock);
+   return -errno; /* if unlink failed, this will be 
-EADDRINUSE as above */
+   }
}
  
  	if (listen(sock, 1) < 0) {

TMTY_LOG(ERR, "Error calling listen for socket: %s\n", 
strerror(errno));
-   goto error;
+   unlink(sun.sun_path);
+   close(sock);
+   return -errno;
}
+   TMTY_LOG(DEBUG, "Socket creation and binding ok\n");
  
  	return sock;

-
-error:
-   close(sock);
-   unlink_sockets();
-   return -1;
  }
  
  static void

@@ -511,8 +528,10 @@ telemetry_legacy_init(void)
return -1;
}
v1_socket.sock = create_socket(v1_socket.path);
-   if (v1_socket.sock < 0)
+   if (v1_socket.sock < 0) {
+   v1_socket.path[0] = '\0';
return -1;
+   }
rc = pthread_create(&t_old, NULL, socket_listener, &v1_socket);
if (rc != 0) {
TMTY_LOG(ERR, "Error with create legcay socket thread: %s\n",
@@ -533,7 +552,9 @@ telemetry_legacy_init(void)
  static int
  telemetry_v2_init(void)
  {
+   char spath[sizeof(v2_socket.path)];
pthread_t t_new;
+   short suffix = 0;
int rc;
  
  	v2_socket.num_clients = &v2_clients;

@@ -544,15 +565,27 @@ telemetry_v2_init(void)
rte_telemetry_register_cmd("/help", command_help,
"Returns help text for a command. Parameters: string 
command");
v2_socket.fn = client_handler;
-   if (strlcpy(v2_socket.path, get_socket_path(socket_dir, 2),
-   sizeof(v2_socket.path)) >= sizeof(v2_socket.path)) {
+   if (strlcpy(spath, get_socket_path(socket_dir, 2), sizeof(spath)) >= 
sizeof(spath)) {
TMTY_LOG(ERR, "Error with socket binding, path too long\n");
return -1;
}
+   memcpy(v2_socket.path, spath, sizeof(v2_socket.path));
  
  	v2_socket.sock = create_socket(v2_socket.path);

-   if (v2_socket.sock < 0)
-   return -1;
+   while (v2_socket.sock < 0) {
+   /* bail out on unexpected error, or suffix wrap-around */
+   if (v2_socket.sock != -EADDRINUSE || suffix < 0) {
+   v2_socket.path[0] = '\0'; /* clear socket path */
+   return -1;
+   }
+   /* add a suffix to the path if the basic version fails */
+   if (snprintf(

Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-14 Thread Thomas Monjalon
14/10/2021 11:31, Harman Kalra:
> From: Thomas Monjalon 
> > 13/10/2021 20:52, Thomas Monjalon:
> > > 13/10/2021 19:57, Harman Kalra:
> > > > From: dev  On Behalf Of Harman Kalra
> > > > > From: Thomas Monjalon 
> > > > > > 04/10/2021 11:57, David Marchand:
> > > > > > > On Mon, Oct 4, 2021 at 10:51 AM Harman Kalra
> > > > > > > 
> > > > > > wrote:
> > > > > > > > > > +struct rte_intr_handle *rte_intr_handle_instance_alloc(int
> > size,
> > > > > > > > > > +
> > > > > > > > > > +bool
> > > > > > > > > > +from_hugepage) {
> > > > > > > > > > +   struct rte_intr_handle *intr_handle;
> > > > > > > > > > +   int i;
> > > > > > > > > > +
> > > > > > > > > > +   if (from_hugepage)
> > > > > > > > > > +   intr_handle = rte_zmalloc(NULL,
> > > > > > > > > > + size * 
> > > > > > > > > > sizeof(struct rte_intr_handle),
> > > > > > > > > > + 0);
> > > > > > > > > > +   else
> > > > > > > > > > +   intr_handle = calloc(1, size *
> > > > > > > > > > + sizeof(struct rte_intr_handle));
> > > > > > > > >
> > > > > > > > > We can call DPDK allocator in all cases.
> > > > > > > > > That would avoid headaches on why multiprocess does not
> > > > > > > > > work in some rarely tested cases.
> > [...]
> > > > > > I agree with David.
> > > > > > I prefer a simpler API which always use rte_malloc, and make
> > > > > > sure interrupts are always handled between rte_eal_init and
> > rte_eal_cleanup.
> > [...]
> > > > > There are couple of more dependencies on glibc heap APIs:
> > > > > 1. "rte_eal_alarm_init()" allocates an interrupt instance which is
> > > > > used for timerfd, is called before "rte_eal_memory_init()" which
> > > > > does the memseg init.
> > > > > Not sure what all challenges we may face in moving alarm_init
> > > > > after memory_init as it might break some subsystem inits.
> > > > > Other option could be to allocate interrupt instance for timerfd
> > > > > on first alarm_setup call.
> > >
> > > Indeed it is an issue.
> > >
> > > [...]
> > >
> > > > > There are many other drivers which statically declares the
> > > > > interrupt handles inside their respective private structures and
> > > > > memory for those structure was allocated from heap. For such
> > > > > drivers I allocated interrupt instances also using glibc heap APIs.
> > >
> > > Could you use rte_malloc in these drivers?
> > 
> > If we take the direction of 2 different allocations mode for the 
> > interrupts, I
> > suggest we make it automatic without any API parameter.
> > We don't have any function to check rte_malloc readiness I think.
> > But we can detect whether shared memory is ready with this check:
> > rte_eal_get_configuration()->mem_config->magic == RTE_MAGIC This check
> > is true at the end of rte_eal_init, so it is false during probing.
> > Would it be enough? Or should we implement rte_malloc_is_ready()?
> 
> Hi Thomas,
> 
> It's a very good suggestion. Let's implement "rte_malloc_is_ready()" which 
> could be as
> simple as " rte_eal_get_configuration()->mem_config->magic == RTE_MAGIC" 
> check.
> There may be more consumers for this API in future.

You cannot rely on the magic because it is set only after probing.
For such API you need to have another internal flag to check that malloc is 
setup.




[dpdk-dev] [PATCH v7 00/12] dma: add dmadev driver for ioat devices

2021-10-14 Thread Conor Walsh
This patchset adds a dmadev driver and associated documentation to support
Intel QuickData Technology devices, part of the Intel I/O Acceleration
Technology (Intel I/OAT). This driver is intended to ultimately replace
the current IOAT part of the IOAT rawdev driver.
This patchset passes all the driver tests added in the dmadev test suite.

NOTE: This patchset has several dependencies:
 - v26 of the dmadev set [1]
 - v7 of the dmadev test suite [2]
 - v7 of the IDXD driver [3]

[1] http://patches.dpdk.org/project/dpdk/list/?series=19594
[2] http://patches.dpdk.org/project/dpdk/list/?series=19599
[3] http://patches.dpdk.org/project/dpdk/list/?series=19603

---

v7:
 - Minor rework to update from v23 to v26 of the dmadev lib.

v6:
 - Added rawdev IOAT deprecation notice to deprecation.rst.

v5:
 - Updated to v23 of the dmadev lib.
 - Removed experimental tag for driver from MAINTAINERS.
 - Seperated IOAT and IDXD announcements in release notes.
 - Added missing check for rte_dma_get_dev_id in destroy.
 - Fixed memleak in destroy caused by NULL pointer.
 - Rewrote part of the docs to reduce duplication with DMA and IDXD.
 - Added patch to deprecate the rawdev IOAT driver.
 - Reworked destroy and close functions.
 - Added RTE_DMA_CAPA_HANDLES_ERRORS flag for IOAT versions >=3.4.
 - Other minor changes to IOAT driver.

v4:
 - Changes needed to update from dmadev v21 to v22.
 - Fixed 32-bit build.
 - Made stats reset logic easier to understand.

v3:
 - Added burst capacity function.
 - Stop function now waits for suspend rather than just using a sleep.
 - Changed from vchan idle to vchan status function.
 - Other minor changes to update from dmadev v19 to v21.

v2:
 - Rebased on the above patchsets.

Conor Walsh (12):
  dma/ioat: add device probe and removal functions
  dma/ioat: create dmadev instances on PCI probe
  dma/ioat: add datapath structures
  dma/ioat: add configuration functions
  dma/ioat: add start and stop functions
  dma/ioat: add data path job submission functions
  dma/ioat: add data path completion functions
  dma/ioat: add statistics
  dma/ioat: add support for vchan status function
  dma/ioat: add burst capacity function
  devbind: move ioat device IDs to dmadev category
  raw/ioat: deprecate ioat rawdev driver

 MAINTAINERS|   8 +-
 doc/guides/dmadevs/index.rst   |   2 +
 doc/guides/dmadevs/ioat.rst| 127 +
 doc/guides/rawdevs/ioat.rst|   4 +
 doc/guides/rel_notes/deprecation.rst   |   7 +
 doc/guides/rel_notes/release_21_11.rst |   6 +
 drivers/dma/ioat/ioat_dmadev.c | 748 +
 drivers/dma/ioat/ioat_hw_defs.h| 295 ++
 drivers/dma/ioat/ioat_internal.h   |  47 ++
 drivers/dma/ioat/meson.build   |   7 +
 drivers/dma/ioat/version.map   |   3 +
 drivers/dma/meson.build|   1 +
 usertools/dpdk-devbind.py  |   7 +-
 13 files changed, 1257 insertions(+), 5 deletions(-)
 create mode 100644 doc/guides/dmadevs/ioat.rst
 create mode 100644 drivers/dma/ioat/ioat_dmadev.c
 create mode 100644 drivers/dma/ioat/ioat_hw_defs.h
 create mode 100644 drivers/dma/ioat/ioat_internal.h
 create mode 100644 drivers/dma/ioat/meson.build
 create mode 100644 drivers/dma/ioat/version.map

-- 
2.25.1



[dpdk-dev] [PATCH v7 01/12] dma/ioat: add device probe and removal functions

2021-10-14 Thread Conor Walsh
Add the basic device probe/remove skeleton code and initial documentation
for new IOAT DMA driver. Maintainers update is also included in this
patch.

Signed-off-by: Conor Walsh 
Reviewed-by: Kevin Laatz 
Reviewed-by: Chengwen Feng 
---
 MAINTAINERS|  6 +++
 doc/guides/dmadevs/index.rst   |  2 +
 doc/guides/dmadevs/ioat.rst| 69 ++
 doc/guides/rel_notes/release_21_11.rst |  6 +++
 drivers/dma/ioat/ioat_dmadev.c | 69 ++
 drivers/dma/ioat/ioat_hw_defs.h| 35 +
 drivers/dma/ioat/ioat_internal.h   | 20 
 drivers/dma/ioat/meson.build   |  7 +++
 drivers/dma/ioat/version.map   |  3 ++
 drivers/dma/meson.build|  1 +
 10 files changed, 218 insertions(+)
 create mode 100644 doc/guides/dmadevs/ioat.rst
 create mode 100644 drivers/dma/ioat/ioat_dmadev.c
 create mode 100644 drivers/dma/ioat/ioat_hw_defs.h
 create mode 100644 drivers/dma/ioat/ioat_internal.h
 create mode 100644 drivers/dma/ioat/meson.build
 create mode 100644 drivers/dma/ioat/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 423d8a73ce..283c70f7d7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1209,6 +1209,12 @@ M: Kevin Laatz 
 F: drivers/dma/idxd/
 F: doc/guides/dmadevs/idxd.rst
 
+Intel IOAT
+M: Bruce Richardson 
+M: Conor Walsh 
+F: drivers/dma/ioat/
+F: doc/guides/dmadevs/ioat.rst
+
 
 RegEx Drivers
 -
diff --git a/doc/guides/dmadevs/index.rst b/doc/guides/dmadevs/index.rst
index 5d4abf880e..c59f4b5c92 100644
--- a/doc/guides/dmadevs/index.rst
+++ b/doc/guides/dmadevs/index.rst
@@ -12,3 +12,5 @@ an application through DMA API.
:numbered:
 
idxd
+   ioat
+
diff --git a/doc/guides/dmadevs/ioat.rst b/doc/guides/dmadevs/ioat.rst
new file mode 100644
index 00..9ae1d8a2ad
--- /dev/null
+++ b/doc/guides/dmadevs/ioat.rst
@@ -0,0 +1,69 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright(c) 2021 Intel Corporation.
+
+.. include:: 
+
+IOAT DMA Device Driver
+===
+
+The ``ioat`` dmadev driver provides a poll-mode driver (PMD) for Intel\
+|reg| QuickData Technology which is part of part of Intel\ |reg| I/O
+Acceleration Technology (`Intel I/OAT
+`_).
+This PMD, when used on supported hardware, allows data copies, for example,
+cloning packet data, to be accelerated by IOAT hardware rather than having to
+be done by software, freeing up CPU cycles for other tasks.
+
+Hardware Requirements
+--
+
+The ``dpdk-devbind.py`` script, included with DPDK, can be used to show the
+presence of supported hardware. Running ``dpdk-devbind.py --status-dev dma``
+will show all the DMA devices on the system, IOAT devices are included in this
+list. For Intel\ |reg| IOAT devices, the hardware will often be listed as
+"Crystal Beach DMA", or "CBDMA" or on some newer systems '0b00' due to the
+absence of pci-id database entries for them at this point.
+
+.. note::
+Error handling is not supported by this driver on hardware prior to
+Intel Ice Lake. Unsupported systems include Broadwell, Skylake and
+Cascade Lake.
+
+Compilation
+
+
+For builds using ``meson`` and ``ninja``, the driver will be built when the
+target platform is x86-based. No additional compilation steps are necessary.
+
+Device Setup
+-
+
+Intel\ |reg| IOAT devices will need to be bound to a suitable DPDK-supported
+user-space IO driver such as ``vfio-pci`` in order to be used by DPDK.
+
+The ``dpdk-devbind.py`` script can be used to view the state of the devices 
using::
+
+   $ dpdk-devbind.py --status-dev dma
+
+The ``dpdk-devbind.py`` script can also be used to bind devices to a suitable 
driver.
+For example::
+
+   $ dpdk-devbind.py -b vfio-pci 00:01.0 00:01.1
+
+Device Probing and Initialization
+~~
+
+For devices bound to a suitable DPDK-supported driver (``vfio-pci``), the HW
+devices will be found as part of the device scan done at application
+initialization time without the need to pass parameters to the application.
+
+If the application does not require all the devices available an allowlist can
+be used in the same way that other DPDK devices use them.
+
+For example::
+
+   $ dpdk-test -a 
+
+Once probed successfully, the device will appear as a ``dmadev``, that is a
+"DMA device type" inside DPDK, and can be accessed using APIs from the
+``rte_dmadev`` library.
diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index 65868a730a..9f99a5b380 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -85,6 +85,12 @@ New Features
   The IDXD dmadev driver provide device drivers for the Intel DSA devices.
   This device driver can be used through the generic dmadev API.
 
+* **Added IOAT dmadev driver implemen

[dpdk-dev] [PATCH v7 02/12] dma/ioat: create dmadev instances on PCI probe

2021-10-14 Thread Conor Walsh
When a suitable device is found during the PCI probe, create a dmadev
instance for each channel. Internal structures and HW definitions required
for device creation are also included.

Signed-off-by: Conor Walsh 
Reviewed-by: Kevin Laatz 
---
 drivers/dma/ioat/ioat_dmadev.c   | 105 ++-
 drivers/dma/ioat/ioat_hw_defs.h  |  45 +
 drivers/dma/ioat/ioat_internal.h |  27 
 3 files changed, 175 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/ioat/ioat_dmadev.c b/drivers/dma/ioat/ioat_dmadev.c
index f3491d45b1..90f54567a4 100644
--- a/drivers/dma/ioat/ioat_dmadev.c
+++ b/drivers/dma/ioat/ioat_dmadev.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 #include "ioat_internal.h"
 
@@ -14,6 +15,106 @@ RTE_LOG_REGISTER_DEFAULT(ioat_pmd_logtype, INFO);
 #define IOAT_PMD_NAME dmadev_ioat
 #define IOAT_PMD_NAME_STR RTE_STR(IOAT_PMD_NAME)
 
+/* Create a DMA device. */
+static int
+ioat_dmadev_create(const char *name, struct rte_pci_device *dev)
+{
+   static const struct rte_dma_dev_ops ioat_dmadev_ops = { };
+
+   struct rte_dma_dev *dmadev = NULL;
+   struct ioat_dmadev *ioat = NULL;
+   int retry = 0;
+
+   if (!name) {
+   IOAT_PMD_ERR("Invalid name of the device!");
+   return -EINVAL;
+   }
+
+   /* Allocate device structure. */
+   dmadev = rte_dma_pmd_allocate(name, dev->device.numa_node, 
sizeof(struct ioat_dmadev));
+   if (dmadev == NULL) {
+   IOAT_PMD_ERR("Unable to allocate dma device");
+   return -ENOMEM;
+   }
+
+   dmadev->device = &dev->device;
+
+   dmadev->fp_obj->dev_private = dmadev->data->dev_private;
+
+   dmadev->dev_ops = &ioat_dmadev_ops;
+
+   ioat = dmadev->data->dev_private;
+   ioat->dmadev = dmadev;
+   ioat->regs = dev->mem_resource[0].addr;
+   ioat->doorbell = &ioat->regs->dmacount;
+   ioat->qcfg.nb_desc = 0;
+   ioat->desc_ring = NULL;
+   ioat->version = ioat->regs->cbver;
+
+   /* Do device initialization - reset and set error behaviour. */
+   if (ioat->regs->chancnt != 1)
+   IOAT_PMD_WARN("%s: Channel count == %d\n", __func__,
+   ioat->regs->chancnt);
+
+   /* Locked by someone else. */
+   if (ioat->regs->chanctrl & IOAT_CHANCTRL_CHANNEL_IN_USE) {
+   IOAT_PMD_WARN("%s: Channel appears locked\n", __func__);
+   ioat->regs->chanctrl = 0;
+   }
+
+   /* clear any previous errors */
+   if (ioat->regs->chanerr != 0) {
+   uint32_t val = ioat->regs->chanerr;
+   ioat->regs->chanerr = val;
+   }
+
+   ioat->regs->chancmd = IOAT_CHANCMD_SUSPEND;
+   rte_delay_ms(1);
+   ioat->regs->chancmd = IOAT_CHANCMD_RESET;
+   rte_delay_ms(1);
+   while (ioat->regs->chancmd & IOAT_CHANCMD_RESET) {
+   ioat->regs->chainaddr = 0;
+   rte_delay_ms(1);
+   if (++retry >= 200) {
+   IOAT_PMD_ERR("%s: cannot reset device. CHANCMD=%#"PRIx8
+   ", CHANSTS=%#"PRIx64", 
CHANERR=%#"PRIx32"\n",
+   __func__,
+   ioat->regs->chancmd,
+   ioat->regs->chansts,
+   ioat->regs->chanerr);
+   rte_dma_pmd_release(name);
+   return -EIO;
+   }
+   }
+   ioat->regs->chanctrl = IOAT_CHANCTRL_ANY_ERR_ABORT_EN |
+   IOAT_CHANCTRL_ERR_COMPLETION_EN;
+
+   dmadev->fp_obj->dev_private = ioat;
+
+   dmadev->state = RTE_DMA_DEV_READY;
+
+   return 0;
+
+}
+
+/* Destroy a DMA device. */
+static int
+ioat_dmadev_destroy(const char *name)
+{
+   int ret;
+
+   if (!name) {
+   IOAT_PMD_ERR("Invalid device name");
+   return -EINVAL;
+   }
+
+   ret = rte_dma_pmd_release(name);
+   if (ret)
+   IOAT_PMD_DEBUG("Device cleanup failed");
+
+   return 0;
+}
+
 /* Probe DMA device. */
 static int
 ioat_dmadev_probe(struct rte_pci_driver *drv, struct rte_pci_device *dev)
@@ -24,7 +125,7 @@ ioat_dmadev_probe(struct rte_pci_driver *drv, struct 
rte_pci_device *dev)
IOAT_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
 
dev->device.driver = &drv->driver;
-   return 0;
+   return ioat_dmadev_create(name, dev);
 }
 
 /* Remove DMA device. */
@@ -38,7 +139,7 @@ ioat_dmadev_remove(struct rte_pci_device *dev)
IOAT_PMD_INFO("Closing %s on NUMA node %d",
name, dev->device.numa_node);
 
-   return 0;
+   return ioat_dmadev_destroy(name);
 }
 
 static const struct rte_pci_id pci_id_ioat_map[] = {
diff --git a/drivers/dma/ioat/ioat_hw_defs.h b/drivers/dma/ioat/ioat_hw_defs.h
index eeabba41ef..73bdf548b3 100644
--- a/drivers/dma/ioat/ioat_hw_defs.h
+++ b/drivers/d

[dpdk-dev] [PATCH v7 03/12] dma/ioat: add datapath structures

2021-10-14 Thread Conor Walsh
Add data structures required for the data path of IOAT devices.

Signed-off-by: Conor Walsh 
Signed-off-by: Bruce Richardson 
Reviewed-by: Kevin Laatz 
---
 drivers/dma/ioat/ioat_dmadev.c  |  70 ++-
 drivers/dma/ioat/ioat_hw_defs.h | 215 
 2 files changed, 284 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/ioat/ioat_dmadev.c b/drivers/dma/ioat/ioat_dmadev.c
index 90f54567a4..876e17f320 100644
--- a/drivers/dma/ioat/ioat_dmadev.c
+++ b/drivers/dma/ioat/ioat_dmadev.c
@@ -15,11 +15,79 @@ RTE_LOG_REGISTER_DEFAULT(ioat_pmd_logtype, INFO);
 #define IOAT_PMD_NAME dmadev_ioat
 #define IOAT_PMD_NAME_STR RTE_STR(IOAT_PMD_NAME)
 
+/* Dump DMA device info. */
+static int
+__dev_dump(void *dev_private, FILE *f)
+{
+   struct ioat_dmadev *ioat = dev_private;
+   uint64_t chansts_masked = ioat->regs->chansts & IOAT_CHANSTS_STATUS;
+   uint32_t chanerr = ioat->regs->chanerr;
+   uint64_t mask = (ioat->qcfg.nb_desc - 1);
+   char ver = ioat->version;
+   fprintf(f, "= IOAT =\n");
+   fprintf(f, "  IOAT version: %d.%d\n", ver >> 4, ver & 0xF);
+   fprintf(f, "  Channel status: %s [0x%"PRIx64"]\n",
+   chansts_readable[chansts_masked], chansts_masked);
+   fprintf(f, "  ChainADDR: 0x%"PRIu64"\n", ioat->regs->chainaddr);
+   if (chanerr == 0) {
+   fprintf(f, "  No Channel Errors\n");
+   } else {
+   fprintf(f, "  ChanERR: 0x%"PRIu32"\n", chanerr);
+   if (chanerr & IOAT_CHANERR_INVALID_SRC_ADDR_MASK)
+   fprintf(f, "Invalid Source Address\n");
+   if (chanerr & IOAT_CHANERR_INVALID_DST_ADDR_MASK)
+   fprintf(f, "Invalid Destination Address\n");
+   if (chanerr & IOAT_CHANERR_INVALID_LENGTH_MASK)
+   fprintf(f, "Invalid Descriptor Length\n");
+   if (chanerr & IOAT_CHANERR_DESCRIPTOR_READ_ERROR_MASK)
+   fprintf(f, "Descriptor Read Error\n");
+   if ((chanerr & ~(IOAT_CHANERR_INVALID_SRC_ADDR_MASK |
+   IOAT_CHANERR_INVALID_DST_ADDR_MASK |
+   IOAT_CHANERR_INVALID_LENGTH_MASK |
+   IOAT_CHANERR_DESCRIPTOR_READ_ERROR_MASK)) != 0)
+   fprintf(f, "Unknown Error(s)\n");
+   }
+   fprintf(f, "== Private Data ==\n");
+   fprintf(f, "  Config: { ring_size: %u }\n", ioat->qcfg.nb_desc);
+   fprintf(f, "  Status: 0x%"PRIx64"\n", ioat->status);
+   fprintf(f, "  Status IOVA: 0x%"PRIx64"\n", ioat->status_addr);
+   fprintf(f, "  Status ADDR: %p\n", &ioat->status);
+   fprintf(f, "  Ring IOVA: 0x%"PRIx64"\n", ioat->ring_addr);
+   fprintf(f, "  Ring ADDR: 0x%"PRIx64"\n", ioat->desc_ring[0].next-64);
+   fprintf(f, "  Next write: %"PRIu16"\n", ioat->next_write);
+   fprintf(f, "  Next read: %"PRIu16"\n", ioat->next_read);
+   struct ioat_dma_hw_desc *desc_ring = &ioat->desc_ring[(ioat->next_write 
- 1) & mask];
+   fprintf(f, "  Last Descriptor Written {\n");
+   fprintf(f, "Size: %"PRIu32"\n", desc_ring->size);
+   fprintf(f, "Control: 0x%"PRIx32"\n", desc_ring->u.control_raw);
+   fprintf(f, "Src: 0x%"PRIx64"\n", desc_ring->src_addr);
+   fprintf(f, "Dest: 0x%"PRIx64"\n", desc_ring->dest_addr);
+   fprintf(f, "Next: 0x%"PRIx64"\n", desc_ring->next);
+   fprintf(f, "  }\n");
+   fprintf(f, "  Next Descriptor {\n");
+   fprintf(f, "Size: %"PRIu32"\n", ioat->desc_ring[ioat->next_read & 
mask].size);
+   fprintf(f, "Src: 0x%"PRIx64"\n", ioat->desc_ring[ioat->next_read & 
mask].src_addr);
+   fprintf(f, "Dest: 0x%"PRIx64"\n", ioat->desc_ring[ioat->next_read & 
mask].dest_addr);
+   fprintf(f, "Next: 0x%"PRIx64"\n", ioat->desc_ring[ioat->next_read & 
mask].next);
+   fprintf(f, "  }\n");
+
+   return 0;
+}
+
+/* Public wrapper for dump. */
+static int
+ioat_dev_dump(const struct rte_dma_dev *dev, FILE *f)
+{
+   return __dev_dump(dev->fp_obj->dev_private, f);
+}
+
 /* Create a DMA device. */
 static int
 ioat_dmadev_create(const char *name, struct rte_pci_device *dev)
 {
-   static const struct rte_dma_dev_ops ioat_dmadev_ops = { };
+   static const struct rte_dma_dev_ops ioat_dmadev_ops = {
+   .dev_dump = ioat_dev_dump,
+   };
 
struct rte_dma_dev *dmadev = NULL;
struct ioat_dmadev *ioat = NULL;
diff --git a/drivers/dma/ioat/ioat_hw_defs.h b/drivers/dma/ioat/ioat_hw_defs.h
index 73bdf548b3..dc3493a78f 100644
--- a/drivers/dma/ioat/ioat_hw_defs.h
+++ b/drivers/dma/ioat/ioat_hw_defs.h
@@ -15,6 +15,7 @@ extern "C" {
 
 #define IOAT_VER_3_0   0x30
 #define IOAT_VER_3_3   0x33
+#define IOAT_VER_3_4   0x34
 
 #define IOAT_VENDOR_ID 0x8086
 #define IOAT_DEVICE_ID_SKX 0x2021
@@ -43,6 +44,14 @@ extern "C" {
 #define IOAT_CHANCTRL_ERR_COMPLETION_EN 

[dpdk-dev] [PATCH v7 04/12] dma/ioat: add configuration functions

2021-10-14 Thread Conor Walsh
Add functions for device configuration. The info_get and close functions
are included here also. info_get can be useful for checking successful
configuration and close is used by the dmadev api when releasing a
configured device.

Signed-off-by: Conor Walsh 
Reviewed-by: Kevin Laatz 
---
 doc/guides/dmadevs/ioat.rst|  15 +
 drivers/dma/ioat/ioat_dmadev.c | 107 +
 2 files changed, 122 insertions(+)

diff --git a/doc/guides/dmadevs/ioat.rst b/doc/guides/dmadevs/ioat.rst
index 9ae1d8a2ad..af69556241 100644
--- a/doc/guides/dmadevs/ioat.rst
+++ b/doc/guides/dmadevs/ioat.rst
@@ -67,3 +67,18 @@ For example::
 Once probed successfully, the device will appear as a ``dmadev``, that is a
 "DMA device type" inside DPDK, and can be accessed using APIs from the
 ``rte_dmadev`` library.
+
+Using IOAT DMAdev Devices
+--
+
+To use IOAT devices from an application, the ``dmadev`` API can be used.
+
+Device Configuration
+~
+
+IOAT configuration requirements:
+
+* ``ring_size`` must be a power of two, between 64 and 4096.
+* Only one ``vchan`` is supported per device.
+* Silent mode is not supported.
+* The transfer direction must be set to ``RTE_DMA_DIR_MEM_TO_MEM`` to copy 
from memory to memory.
diff --git a/drivers/dma/ioat/ioat_dmadev.c b/drivers/dma/ioat/ioat_dmadev.c
index 876e17f320..ada57c5814 100644
--- a/drivers/dma/ioat/ioat_dmadev.c
+++ b/drivers/dma/ioat/ioat_dmadev.c
@@ -12,9 +12,112 @@ static struct rte_pci_driver ioat_pmd_drv;
 
 RTE_LOG_REGISTER_DEFAULT(ioat_pmd_logtype, INFO);
 
+#define DESC_SZ sizeof(struct ioat_dma_hw_desc)
+
 #define IOAT_PMD_NAME dmadev_ioat
 #define IOAT_PMD_NAME_STR RTE_STR(IOAT_PMD_NAME)
 
+/* Configure a device. */
+static int
+ioat_dev_configure(struct rte_dma_dev *dev __rte_unused, const struct 
rte_dma_conf *dev_conf,
+   uint32_t conf_sz)
+{
+   if (sizeof(struct rte_dma_conf) != conf_sz)
+   return -EINVAL;
+
+   if (dev_conf->nb_vchans != 1)
+   return -EINVAL;
+
+   return 0;
+}
+
+/* Setup a virtual channel for IOAT, only 1 vchan is supported. */
+static int
+ioat_vchan_setup(struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+   const struct rte_dma_vchan_conf *qconf, uint32_t qconf_sz)
+{
+   struct ioat_dmadev *ioat = dev->fp_obj->dev_private;
+   uint16_t max_desc = qconf->nb_desc;
+   int i;
+
+   if (sizeof(struct rte_dma_vchan_conf) != qconf_sz)
+   return -EINVAL;
+
+   ioat->qcfg = *qconf;
+
+   if (!rte_is_power_of_2(max_desc)) {
+   max_desc = rte_align32pow2(max_desc);
+   IOAT_PMD_DEBUG("DMA dev %u using %u descriptors", 
dev->data->dev_id, max_desc);
+   ioat->qcfg.nb_desc = max_desc;
+   }
+
+   /* In case we are reconfiguring a device, free any existing memory. */
+   rte_free(ioat->desc_ring);
+
+   ioat->desc_ring = rte_zmalloc(NULL, sizeof(*ioat->desc_ring) * 
max_desc, 0);
+   if (ioat->desc_ring == NULL)
+   return -ENOMEM;
+
+   ioat->ring_addr = rte_mem_virt2iova(ioat->desc_ring);
+
+   ioat->status_addr = rte_mem_virt2iova(ioat) + offsetof(struct 
ioat_dmadev, status);
+
+   /* Ensure all counters are reset, if reconfiguring/restarting device. */
+   ioat->next_read = 0;
+   ioat->next_write = 0;
+   ioat->last_write = 0;
+   ioat->offset = 0;
+   ioat->failure = 0;
+
+   /* Configure descriptor ring - each one points to next. */
+   for (i = 0; i < ioat->qcfg.nb_desc; i++) {
+   ioat->desc_ring[i].next = ioat->ring_addr +
+   (((i + 1) % ioat->qcfg.nb_desc) * DESC_SZ);
+   }
+
+   return 0;
+}
+
+/* Get device information of a device. */
+static int
+ioat_dev_info_get(const struct rte_dma_dev *dev, struct rte_dma_info *info, 
uint32_t size)
+{
+   struct ioat_dmadev *ioat = dev->fp_obj->dev_private;
+   if (size < sizeof(*info))
+   return -EINVAL;
+   info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM |
+   RTE_DMA_CAPA_OPS_COPY |
+   RTE_DMA_CAPA_OPS_FILL;
+   if (ioat->version >= IOAT_VER_3_4)
+   info->dev_capa |= RTE_DMA_CAPA_HANDLES_ERRORS;
+   info->max_vchans = 1;
+   info->min_desc = 32;
+   info->max_desc = 4096;
+   return 0;
+}
+
+/* Close a configured device. */
+static int
+ioat_dev_close(struct rte_dma_dev *dev)
+{
+   struct ioat_dmadev *ioat;
+
+   if (!dev) {
+   IOAT_PMD_ERR("Invalid device");
+   return -EINVAL;
+   }
+
+   ioat = dev->fp_obj->dev_private;
+   if (!ioat) {
+   IOAT_PMD_ERR("Error getting dev_private");
+   return -EINVAL;
+   }
+
+   rte_free(ioat->desc_ring);
+
+   return 0;
+}
+
 /* Dump DMA device info. */
 static int
 __dev_dump(void *dev_private, FILE *f)
@@ -86,7 +189,11 @@ static int
 ioat_dmadev_create(const

[dpdk-dev] [PATCH v7 05/12] dma/ioat: add start and stop functions

2021-10-14 Thread Conor Walsh
Add start, stop and recover functions for IOAT devices.

Signed-off-by: Conor Walsh 
Signed-off-by: Bruce Richardson 
Reviewed-by: Kevin Laatz 
---
 doc/guides/dmadevs/ioat.rst|  3 ++
 drivers/dma/ioat/ioat_dmadev.c | 92 ++
 2 files changed, 95 insertions(+)

diff --git a/doc/guides/dmadevs/ioat.rst b/doc/guides/dmadevs/ioat.rst
index af69556241..df159f9957 100644
--- a/doc/guides/dmadevs/ioat.rst
+++ b/doc/guides/dmadevs/ioat.rst
@@ -82,3 +82,6 @@ IOAT configuration requirements:
 * Only one ``vchan`` is supported per device.
 * Silent mode is not supported.
 * The transfer direction must be set to ``RTE_DMA_DIR_MEM_TO_MEM`` to copy 
from memory to memory.
+
+Once configured, the device can then be made ready for use by calling the
+``rte_dma_start()`` API.
diff --git a/drivers/dma/ioat/ioat_dmadev.c b/drivers/dma/ioat/ioat_dmadev.c
index ada57c5814..cf28f4a7e6 100644
--- a/drivers/dma/ioat/ioat_dmadev.c
+++ b/drivers/dma/ioat/ioat_dmadev.c
@@ -78,6 +78,96 @@ ioat_vchan_setup(struct rte_dma_dev *dev, uint16_t vchan 
__rte_unused,
return 0;
 }
 
+/* Recover IOAT device. */
+static inline int
+__ioat_recover(struct ioat_dmadev *ioat)
+{
+   uint32_t chanerr, retry = 0;
+   uint16_t mask = ioat->qcfg.nb_desc - 1;
+
+   /* Clear any channel errors. Reading and writing to chanerr does this. 
*/
+   chanerr = ioat->regs->chanerr;
+   ioat->regs->chanerr = chanerr;
+
+   /* Reset Channel. */
+   ioat->regs->chancmd = IOAT_CHANCMD_RESET;
+
+   /* Write new chain address to trigger state change. */
+   ioat->regs->chainaddr = ioat->desc_ring[(ioat->next_read - 1) & 
mask].next;
+   /* Ensure channel control and status addr are correct. */
+   ioat->regs->chanctrl = IOAT_CHANCTRL_ANY_ERR_ABORT_EN |
+   IOAT_CHANCTRL_ERR_COMPLETION_EN;
+   ioat->regs->chancmp = ioat->status_addr;
+
+   /* Allow HW time to move to the ARMED state. */
+   do {
+   rte_pause();
+   retry++;
+   } while (ioat->regs->chansts != IOAT_CHANSTS_ARMED && retry < 200);
+
+   /* Exit as failure if device is still HALTED. */
+   if (ioat->regs->chansts != IOAT_CHANSTS_ARMED)
+   return -1;
+
+   /* Store next write as offset as recover will move HW and SW ring out 
of sync. */
+   ioat->offset = ioat->next_read;
+
+   /* Prime status register with previous address. */
+   ioat->status = ioat->desc_ring[(ioat->next_read - 2) & mask].next;
+
+   return 0;
+}
+
+/* Start a configured device. */
+static int
+ioat_dev_start(struct rte_dma_dev *dev)
+{
+   struct ioat_dmadev *ioat = dev->fp_obj->dev_private;
+
+   if (ioat->qcfg.nb_desc == 0 || ioat->desc_ring == NULL)
+   return -EBUSY;
+
+   /* Inform hardware of where the descriptor ring is. */
+   ioat->regs->chainaddr = ioat->ring_addr;
+   /* Inform hardware of where to write the status/completions. */
+   ioat->regs->chancmp = ioat->status_addr;
+
+   /* Prime the status register to be set to the last element. */
+   ioat->status = ioat->ring_addr + ((ioat->qcfg.nb_desc - 1) * DESC_SZ);
+
+   printf("IOAT.status: %s [0x%"PRIx64"]\n",
+   chansts_readable[ioat->status & IOAT_CHANSTS_STATUS],
+   ioat->status);
+
+   if ((ioat->regs->chansts & IOAT_CHANSTS_STATUS) == IOAT_CHANSTS_HALTED) 
{
+   IOAT_PMD_WARN("Device HALTED on start, attempting to 
recover\n");
+   if (__ioat_recover(ioat) != 0) {
+   IOAT_PMD_ERR("Device couldn't be recovered");
+   return -1;
+   }
+   }
+
+   return 0;
+}
+
+/* Stop a configured device. */
+static int
+ioat_dev_stop(struct rte_dma_dev *dev)
+{
+   struct ioat_dmadev *ioat = dev->fp_obj->dev_private;
+   uint32_t retry = 0;
+
+   ioat->regs->chancmd = IOAT_CHANCMD_SUSPEND;
+
+   do {
+   rte_pause();
+   retry++;
+   } while ((ioat->regs->chansts & IOAT_CHANSTS_STATUS) != 
IOAT_CHANSTS_SUSPENDED
+   && retry < 200);
+
+   return ((ioat->regs->chansts & IOAT_CHANSTS_STATUS) == 
IOAT_CHANSTS_SUSPENDED) ? 0 : -1;
+}
+
 /* Get device information of a device. */
 static int
 ioat_dev_info_get(const struct rte_dma_dev *dev, struct rte_dma_info *info, 
uint32_t size)
@@ -193,6 +283,8 @@ ioat_dmadev_create(const char *name, struct rte_pci_device 
*dev)
.dev_configure = ioat_dev_configure,
.dev_dump = ioat_dev_dump,
.dev_info_get = ioat_dev_info_get,
+   .dev_start = ioat_dev_start,
+   .dev_stop = ioat_dev_stop,
.vchan_setup = ioat_vchan_setup,
};
 
-- 
2.25.1



[dpdk-dev] [PATCH v7 06/12] dma/ioat: add data path job submission functions

2021-10-14 Thread Conor Walsh
Add data path functions for enqueuing and submitting operations to
IOAT devices.

Signed-off-by: Conor Walsh 
Reviewed-by: Kevin Laatz 
Reviewed-by: Chengwen Feng 
---
 doc/guides/dmadevs/ioat.rst|  9 
 drivers/dma/ioat/ioat_dmadev.c | 92 ++
 2 files changed, 101 insertions(+)

diff --git a/doc/guides/dmadevs/ioat.rst b/doc/guides/dmadevs/ioat.rst
index df159f9957..9ee4e372a8 100644
--- a/doc/guides/dmadevs/ioat.rst
+++ b/doc/guides/dmadevs/ioat.rst
@@ -85,3 +85,12 @@ IOAT configuration requirements:
 
 Once configured, the device can then be made ready for use by calling the
 ``rte_dma_start()`` API.
+
+Performing Data Copies
+~~~
+
+Refer to the :ref:`Enqueue / Dequeue APIs ` section of 
the dmadev library
+documentation for details on operation enqueue and submission API usage.
+
+It is expected that, for efficiency reasons, a burst of operations will be 
enqueued to the
+device via multiple enqueue calls between calls to the ``rte_dma_submit()`` 
function.
diff --git a/drivers/dma/ioat/ioat_dmadev.c b/drivers/dma/ioat/ioat_dmadev.c
index cf28f4a7e6..4d00fec5c8 100644
--- a/drivers/dma/ioat/ioat_dmadev.c
+++ b/drivers/dma/ioat/ioat_dmadev.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ioat_internal.h"
 
@@ -17,6 +18,12 @@ RTE_LOG_REGISTER_DEFAULT(ioat_pmd_logtype, INFO);
 #define IOAT_PMD_NAME dmadev_ioat
 #define IOAT_PMD_NAME_STR RTE_STR(IOAT_PMD_NAME)
 
+/* IOAT operations. */
+enum rte_ioat_ops {
+   ioat_op_copy = 0,   /* Standard DMA Operation */
+   ioat_op_fill/* Block Fill */
+};
+
 /* Configure a device. */
 static int
 ioat_dev_configure(struct rte_dma_dev *dev __rte_unused, const struct 
rte_dma_conf *dev_conf,
@@ -208,6 +215,87 @@ ioat_dev_close(struct rte_dma_dev *dev)
return 0;
 }
 
+/* Trigger hardware to begin performing enqueued operations. */
+static inline void
+__submit(struct ioat_dmadev *ioat)
+{
+   *ioat->doorbell = ioat->next_write - ioat->offset;
+
+   ioat->last_write = ioat->next_write;
+}
+
+/* External submit function wrapper. */
+static int
+ioat_submit(void *dev_private, uint16_t qid __rte_unused)
+{
+   struct ioat_dmadev *ioat = dev_private;
+
+   __submit(ioat);
+
+   return 0;
+}
+
+/* Write descriptor for enqueue. */
+static inline int
+__write_desc(void *dev_private, uint32_t op, uint64_t src, phys_addr_t dst,
+   unsigned int length, uint64_t flags)
+{
+   struct ioat_dmadev *ioat = dev_private;
+   uint16_t ret;
+   const unsigned short mask = ioat->qcfg.nb_desc - 1;
+   const unsigned short read = ioat->next_read;
+   unsigned short write = ioat->next_write;
+   const unsigned short space = mask + read - write;
+   struct ioat_dma_hw_desc *desc;
+
+   if (space == 0)
+   return -ENOSPC;
+
+   ioat->next_write = write + 1;
+   write &= mask;
+
+   desc = &ioat->desc_ring[write];
+   desc->size = length;
+   desc->u.control_raw = (uint32_t)((op << IOAT_CMD_OP_SHIFT) |
+   (1 << IOAT_COMP_UPDATE_SHIFT));
+
+   /* In IOAT the fence ensures that all operations including the current 
one
+* are completed before moving on, DMAdev assumes that the fence ensures
+* all operations before the current one are completed before starting
+* the current one, so in IOAT we set the fence for the previous 
descriptor.
+*/
+   if (flags & RTE_DMA_OP_FLAG_FENCE)
+   ioat->desc_ring[(write - 1) & mask].u.control.fence = 1;
+
+   desc->src_addr = src;
+   desc->dest_addr = dst;
+
+   rte_prefetch0(&ioat->desc_ring[ioat->next_write & mask]);
+
+   ret = (uint16_t)(ioat->next_write - 1);
+
+   if (flags & RTE_DMA_OP_FLAG_SUBMIT)
+   __submit(ioat);
+
+   return ret;
+}
+
+/* Enqueue a fill operation onto the ioat device. */
+static int
+ioat_enqueue_fill(void *dev_private, uint16_t qid __rte_unused, uint64_t 
pattern,
+   rte_iova_t dst, unsigned int length, uint64_t flags)
+{
+   return __write_desc(dev_private, ioat_op_fill, pattern, dst, length, 
flags);
+}
+
+/* Enqueue a copy operation onto the ioat device. */
+static int
+ioat_enqueue_copy(void *dev_private, uint16_t qid __rte_unused, rte_iova_t src,
+   rte_iova_t dst, unsigned int length, uint64_t flags)
+{
+   return __write_desc(dev_private, ioat_op_copy, src, dst, length, flags);
+}
+
 /* Dump DMA device info. */
 static int
 __dev_dump(void *dev_private, FILE *f)
@@ -310,6 +398,10 @@ ioat_dmadev_create(const char *name, struct rte_pci_device 
*dev)
 
dmadev->dev_ops = &ioat_dmadev_ops;
 
+   dmadev->fp_obj->copy = ioat_enqueue_copy;
+   dmadev->fp_obj->fill = ioat_enqueue_fill;
+   dmadev->fp_obj->submit = ioat_submit;
+
ioat = dmadev->data->dev_private;
ioat->dmadev = dmadev;
ioat->regs = dev->mem_resource[0].addr;
-- 
2.25.1



[dpdk-dev] [PATCH v7 07/12] dma/ioat: add data path completion functions

2021-10-14 Thread Conor Walsh
Add the data path functions for gathering completed operations
from IOAT devices.

Signed-off-by: Conor Walsh 
Signed-off-by: Kevin Laatz 
Acked-by: Bruce Richardson 
---
 doc/guides/dmadevs/ioat.rst|  33 +++-
 drivers/dma/ioat/ioat_dmadev.c | 141 +
 2 files changed, 173 insertions(+), 1 deletion(-)

diff --git a/doc/guides/dmadevs/ioat.rst b/doc/guides/dmadevs/ioat.rst
index 9ee4e372a8..9ac90e3108 100644
--- a/doc/guides/dmadevs/ioat.rst
+++ b/doc/guides/dmadevs/ioat.rst
@@ -90,7 +90,38 @@ Performing Data Copies
 ~~~
 
 Refer to the :ref:`Enqueue / Dequeue APIs ` section of 
the dmadev library
-documentation for details on operation enqueue and submission API usage.
+documentation for details on operation enqueue, submission and completion API 
usage.
 
 It is expected that, for efficiency reasons, a burst of operations will be 
enqueued to the
 device via multiple enqueue calls between calls to the ``rte_dma_submit()`` 
function.
+
+When gathering completions, ``rte_dma_completed()`` should be used, up until 
the point an error
+occurs with an operation. If an error was encountered, 
``rte_dma_completed_status()`` must be used
+to reset the device and continue processing operations. This function will 
also gather the status
+of each individual operation which is filled in to the ``status`` array 
provided as parameter
+by the application.
+
+The status codes supported by IOAT are:
+
+* ``RTE_DMA_STATUS_SUCCESSFUL``: The operation was successful.
+* ``RTE_DMA_STATUS_INVALID_SRC_ADDR``: The operation failed due to an invalid 
source address.
+* ``RTE_DMA_STATUS_INVALID_DST_ADDR``: The operation failed due to an invalid 
destination address.
+* ``RTE_DMA_STATUS_INVALID_LENGTH``: The operation failed due to an invalid 
descriptor length.
+* ``RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR``: The device could not read the 
descriptor.
+* ``RTE_DMA_STATUS_ERROR_UNKNOWN``: The operation failed due to an unspecified 
error.
+
+The following code shows how to retrieve the number of successfully completed
+copies within a burst and then uses ``rte_dma_completed_status()`` to check
+which operation failed and reset the device to continue processing operations:
+
+.. code-block:: C
+
+   enum rte_dma_status_code status[COMP_BURST_SZ];
+   uint16_t count, idx, status_count;
+   bool error = 0;
+
+   count = rte_dma_completed(dev_id, vchan, COMP_BURST_SZ, &idx, &error);
+
+   if (error){
+  status_count = rte_dma_completed_status(dev_id, vchan, COMP_BURST_SZ, 
&idx, status);
+   }
diff --git a/drivers/dma/ioat/ioat_dmadev.c b/drivers/dma/ioat/ioat_dmadev.c
index 4d00fec5c8..0318f67772 100644
--- a/drivers/dma/ioat/ioat_dmadev.c
+++ b/drivers/dma/ioat/ioat_dmadev.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ioat_internal.h"
 
@@ -362,6 +363,144 @@ ioat_dev_dump(const struct rte_dma_dev *dev, FILE *f)
return __dev_dump(dev->fp_obj->dev_private, f);
 }
 
+/* Returns the index of the last completed operation. */
+static inline uint16_t
+__get_last_completed(const struct ioat_dmadev *ioat, int *state)
+{
+   /* Status register contains the address of the completed operation */
+   uint64_t status = ioat->status;
+
+   /* lower 3 bits indicate "transfer status" : active, idle, halted.
+* We can ignore bit 0.
+*/
+   *state = status & IOAT_CHANSTS_STATUS;
+
+   /* If we are just after recovering from an error the address returned by
+* status will be 0, in this case we return the offset - 1 as the last
+* completed. If not return the status value minus the chainaddr which
+* gives us an offset into the ring. Right shifting by 6 (divide by 64)
+* gives the index of the completion from the HW point of view and 
adding
+* the offset translates the ring index from HW to SW point of view.
+*/
+   if ((status & ~IOAT_CHANSTS_STATUS) == 0)
+   return ioat->offset - 1;
+
+   return (status - ioat->ring_addr) >> 6;
+}
+
+/* Translates IOAT ChanERRs to DMA error codes. */
+static inline enum rte_dma_status_code
+__translate_status_ioat_to_dma(uint32_t chanerr)
+{
+   if (chanerr & IOAT_CHANERR_INVALID_SRC_ADDR_MASK)
+   return RTE_DMA_STATUS_INVALID_SRC_ADDR;
+   else if (chanerr & IOAT_CHANERR_INVALID_DST_ADDR_MASK)
+   return RTE_DMA_STATUS_INVALID_DST_ADDR;
+   else if (chanerr & IOAT_CHANERR_INVALID_LENGTH_MASK)
+   return RTE_DMA_STATUS_INVALID_LENGTH;
+   else if (chanerr & IOAT_CHANERR_DESCRIPTOR_READ_ERROR_MASK)
+   return RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR;
+   else
+   return RTE_DMA_STATUS_ERROR_UNKNOWN;
+}
+
+/* Returns details of operations that have been completed. */
+static uint16_t
+ioat_completed(void *dev_private, uint16_t qid __rte_unused, const uint16_t 
max_ops,
+   uint16_t *last_idx, bool *has_error)
+{
+   struct ioat_dm

[dpdk-dev] [PATCH v7 08/12] dma/ioat: add statistics

2021-10-14 Thread Conor Walsh
Add statistic tracking for operations in IOAT.

Signed-off-by: Conor Walsh 
Reviewed-by: Kevin Laatz 
Acked-by: Bruce Richardson 
---
 drivers/dma/ioat/ioat_dmadev.c | 43 ++
 1 file changed, 43 insertions(+)

diff --git a/drivers/dma/ioat/ioat_dmadev.c b/drivers/dma/ioat/ioat_dmadev.c
index 0318f67772..b731361f9a 100644
--- a/drivers/dma/ioat/ioat_dmadev.c
+++ b/drivers/dma/ioat/ioat_dmadev.c
@@ -77,6 +77,9 @@ ioat_vchan_setup(struct rte_dma_dev *dev, uint16_t vchan 
__rte_unused,
ioat->offset = 0;
ioat->failure = 0;
 
+   /* Reset Stats. */
+   ioat->stats = (struct rte_dma_stats){0};
+
/* Configure descriptor ring - each one points to next. */
for (i = 0; i < ioat->qcfg.nb_desc; i++) {
ioat->desc_ring[i].next = ioat->ring_addr +
@@ -222,6 +225,8 @@ __submit(struct ioat_dmadev *ioat)
 {
*ioat->doorbell = ioat->next_write - ioat->offset;
 
+   ioat->stats.submitted += (uint16_t)(ioat->next_write - 
ioat->last_write);
+
ioat->last_write = ioat->next_write;
 }
 
@@ -352,6 +357,10 @@ __dev_dump(void *dev_private, FILE *f)
fprintf(f, "Dest: 0x%"PRIx64"\n", ioat->desc_ring[ioat->next_read & 
mask].dest_addr);
fprintf(f, "Next: 0x%"PRIx64"\n", ioat->desc_ring[ioat->next_read & 
mask].next);
fprintf(f, "  }\n");
+   fprintf(f, "  Key Stats { submitted: %"PRIu64", comp: %"PRIu64", 
failed: %"PRIu64" }\n",
+   ioat->stats.submitted,
+   ioat->stats.completed,
+   ioat->stats.errors);
 
return 0;
 }
@@ -448,6 +457,9 @@ ioat_completed(void *dev_private, uint16_t qid 
__rte_unused, const uint16_t max_
*last_idx = ioat->next_read - 2;
}
 
+   ioat->stats.completed += count;
+   ioat->stats.errors += fails;
+
return count;
 }
 
@@ -498,9 +510,38 @@ ioat_completed_status(void *dev_private, uint16_t qid 
__rte_unused,
 
*last_idx = ioat->next_read - 1;
 
+   ioat->stats.completed += count;
+   ioat->stats.errors += fails;
+
return count;
 }
 
+/* Retrieve the generic stats of a DMA device. */
+static int
+ioat_stats_get(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+   struct rte_dma_stats *rte_stats, uint32_t size)
+{
+   struct rte_dma_stats *stats = (&((struct ioat_dmadev 
*)dev->dev_private)->stats);
+
+   if (size < sizeof(rte_stats))
+   return -EINVAL;
+   if (rte_stats == NULL)
+   return -EINVAL;
+
+   *rte_stats = *stats;
+   return 0;
+}
+
+/* Reset the generic stat counters for the DMA device. */
+static int
+ioat_stats_reset(struct rte_dma_dev *dev, uint16_t vchan __rte_unused)
+{
+   struct ioat_dmadev *ioat = dev->dev_private;
+
+   ioat->stats = (struct rte_dma_stats){0};
+   return 0;
+}
+
 /* Create a DMA device. */
 static int
 ioat_dmadev_create(const char *name, struct rte_pci_device *dev)
@@ -512,6 +553,8 @@ ioat_dmadev_create(const char *name, struct rte_pci_device 
*dev)
.dev_info_get = ioat_dev_info_get,
.dev_start = ioat_dev_start,
.dev_stop = ioat_dev_stop,
+   .stats_get = ioat_stats_get,
+   .stats_reset = ioat_stats_reset,
.vchan_setup = ioat_vchan_setup,
};
 
-- 
2.25.1



[dpdk-dev] [PATCH v7 09/12] dma/ioat: add support for vchan status function

2021-10-14 Thread Conor Walsh
Add support for the rte_dmadev_vchan_status API call.

Signed-off-by: Conor Walsh 
Reviewed-by: Kevin Laatz 
Acked-by: Bruce Richardson 
---
 drivers/dma/ioat/ioat_dmadev.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/dma/ioat/ioat_dmadev.c b/drivers/dma/ioat/ioat_dmadev.c
index b731361f9a..17ac3217c7 100644
--- a/drivers/dma/ioat/ioat_dmadev.c
+++ b/drivers/dma/ioat/ioat_dmadev.c
@@ -542,6 +542,26 @@ ioat_stats_reset(struct rte_dma_dev *dev, uint16_t vchan 
__rte_unused)
return 0;
 }
 
+/* Check if the IOAT device is idle. */
+static int
+ioat_vchan_status(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
+   enum rte_dma_vchan_status *status)
+{
+   int state = 0;
+   const struct ioat_dmadev *ioat = dev->dev_private;
+   const uint16_t mask = ioat->qcfg.nb_desc - 1;
+   const uint16_t last = __get_last_completed(ioat, &state);
+
+   if (state == IOAT_CHANSTS_HALTED || state == IOAT_CHANSTS_SUSPENDED)
+   *status = RTE_DMA_VCHAN_HALTED_ERROR;
+   else if (last == ((ioat->next_write - 1) & mask))
+   *status = RTE_DMA_VCHAN_IDLE;
+   else
+   *status = RTE_DMA_VCHAN_ACTIVE;
+
+   return 0;
+}
+
 /* Create a DMA device. */
 static int
 ioat_dmadev_create(const char *name, struct rte_pci_device *dev)
@@ -555,6 +575,7 @@ ioat_dmadev_create(const char *name, struct rte_pci_device 
*dev)
.dev_stop = ioat_dev_stop,
.stats_get = ioat_stats_get,
.stats_reset = ioat_stats_reset,
+   .vchan_status = ioat_vchan_status,
.vchan_setup = ioat_vchan_setup,
};
 
-- 
2.25.1



[dpdk-dev] [PATCH v7 10/12] dma/ioat: add burst capacity function

2021-10-14 Thread Conor Walsh
Adds the ability to find the remaining space in the IOAT ring.

Signed-off-by: Conor Walsh 
Signed-off-by: Kevin Laatz 
Acked-by: Bruce Richardson 
---
 drivers/dma/ioat/ioat_dmadev.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/ioat/ioat_dmadev.c b/drivers/dma/ioat/ioat_dmadev.c
index 17ac3217c7..a230496b11 100644
--- a/drivers/dma/ioat/ioat_dmadev.c
+++ b/drivers/dma/ioat/ioat_dmadev.c
@@ -516,12 +516,25 @@ ioat_completed_status(void *dev_private, uint16_t qid 
__rte_unused,
return count;
 }
 
+/* Get the remaining capacity of the ring. */
+static uint16_t
+ioat_burst_capacity(const void *dev_private, uint16_t vchan __rte_unused)
+{
+   const struct ioat_dmadev *ioat = dev_private;
+   unsigned short size = ioat->qcfg.nb_desc - 1;
+   unsigned short read = ioat->next_read;
+   unsigned short write = ioat->next_write;
+   unsigned short space = size - (write - read);
+
+   return space;
+}
+
 /* Retrieve the generic stats of a DMA device. */
 static int
 ioat_stats_get(const struct rte_dma_dev *dev, uint16_t vchan __rte_unused,
struct rte_dma_stats *rte_stats, uint32_t size)
 {
-   struct rte_dma_stats *stats = (&((struct ioat_dmadev 
*)dev->dev_private)->stats);
+   struct rte_dma_stats *stats = (&((struct ioat_dmadev 
*)dev->fp_obj->dev_private)->stats);
 
if (size < sizeof(rte_stats))
return -EINVAL;
@@ -536,7 +549,7 @@ ioat_stats_get(const struct rte_dma_dev *dev, uint16_t 
vchan __rte_unused,
 static int
 ioat_stats_reset(struct rte_dma_dev *dev, uint16_t vchan __rte_unused)
 {
-   struct ioat_dmadev *ioat = dev->dev_private;
+   struct ioat_dmadev *ioat = dev->fp_obj->dev_private;
 
ioat->stats = (struct rte_dma_stats){0};
return 0;
@@ -548,7 +561,7 @@ ioat_vchan_status(const struct rte_dma_dev *dev, uint16_t 
vchan __rte_unused,
enum rte_dma_vchan_status *status)
 {
int state = 0;
-   const struct ioat_dmadev *ioat = dev->dev_private;
+   const struct ioat_dmadev *ioat = dev->fp_obj->dev_private;
const uint16_t mask = ioat->qcfg.nb_desc - 1;
const uint16_t last = __get_last_completed(ioat, &state);
 
@@ -601,6 +614,7 @@ ioat_dmadev_create(const char *name, struct rte_pci_device 
*dev)
 
dmadev->dev_ops = &ioat_dmadev_ops;
 
+   dmadev->fp_obj->burst_capacity = ioat_burst_capacity;
dmadev->fp_obj->completed = ioat_completed;
dmadev->fp_obj->completed_status = ioat_completed_status;
dmadev->fp_obj->copy = ioat_enqueue_copy;
-- 
2.25.1



[dpdk-dev] [PATCH v7 11/12] devbind: move ioat device IDs to dmadev category

2021-10-14 Thread Conor Walsh
Move Intel IOAT devices from Misc to DMA devices.

Signed-off-by: Conor Walsh 
Reviewed-by: Kevin Laatz 
Reviewed-by: Bruce Richardson 
---
 usertools/dpdk-devbind.py | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index ba18e2a487..91f1b16bde 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -71,14 +71,13 @@
 network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
 baseband_devices = [acceleration_class]
 crypto_devices = [encryption_class, intel_processor_class]
-dma_devices = [intel_idxd_spr]
+dma_devices = [intel_idxd_spr, intel_ioat_bdw, intel_ioat_icx, intel_ioat_skx]
 eventdev_devices = [cavium_sso, cavium_tim, intel_dlb, octeontx2_sso]
 mempool_devices = [cavium_fpa, octeontx2_npa]
 compress_devices = [cavium_zip]
 regex_devices = [octeontx2_ree]
-misc_devices = [cnxk_bphy, cnxk_bphy_cgx, cnxk_inl_dev, intel_ioat_bdw,
-   intel_ioat_skx, intel_ioat_icx, intel_ntb_skx,
-   intel_ntb_icx, octeontx2_dma]
+misc_devices = [cnxk_bphy, cnxk_bphy_cgx, cnxk_inl_dev, intel_ntb_skx,
+intel_ntb_icx, octeontx2_dma]
 
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
-- 
2.25.1



[dpdk-dev] [PATCH v7 12/12] raw/ioat: deprecate ioat rawdev driver

2021-10-14 Thread Conor Walsh
Deprecate the rawdev IOAT driver as both IOAT and IDXD drivers have
moved to dmadev.

Signed-off-by: Conor Walsh 
Acked-by: Kevin Laatz 
Acked-by: Bruce Richardson 
---
 MAINTAINERS  | 2 +-
 doc/guides/rawdevs/ioat.rst  | 4 
 doc/guides/rel_notes/deprecation.rst | 7 +++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 283c70f7d7..b9f7746dc4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1322,7 +1322,7 @@ T: git://dpdk.org/next/dpdk-next-net-intel
 F: drivers/raw/ifpga/
 F: doc/guides/rawdevs/ifpga.rst
 
-IOAT Rawdev
+IOAT Rawdev - DEPRECATED
 M: Bruce Richardson 
 F: drivers/raw/ioat/
 F: doc/guides/rawdevs/ioat.rst
diff --git a/doc/guides/rawdevs/ioat.rst b/doc/guides/rawdevs/ioat.rst
index a65530bd30..98d15dd032 100644
--- a/doc/guides/rawdevs/ioat.rst
+++ b/doc/guides/rawdevs/ioat.rst
@@ -6,6 +6,10 @@
 IOAT Rawdev Driver
 ===
 
+.. warning::
+As of DPDK 21.11 the rawdev implementation of the IOAT driver has been 
deprecated.
+Please use the dmadev library instead.
+
 The ``ioat`` rawdev driver provides a poll-mode driver (PMD) for Intel\ |reg|
 Data Streaming Accelerator `(Intel DSA)
 `_ and 
for Intel\ |reg|
diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index a4e86b31f5..fcdd24c296 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -241,3 +241,10 @@ Deprecation Notices
 * cmdline: ``cmdline`` structure will be made opaque to hide platform-specific
   content. On Linux and FreeBSD, supported prior to DPDK 20.11,
   original structure will be kept until DPDK 21.11.
+
+* raw/ioat: The ``ioat`` rawdev driver has been deprecated, since it's
+  functionality is provided through the new ``dmadev`` infrastructure.
+  To continue to use hardware previously supported by the ``ioat`` rawdev 
driver,
+  applications should be updated to use the ``dmadev`` library instead,
+  with the underlying HW-functionality being provided by the ``ioat`` or
+  ``idxd`` dma drivers
-- 
2.25.1



[dpdk-dev] [PATCH] net/af_xdp: use bpf link for XDP programs

2021-10-14 Thread Ciara Loftus
Since v0.4.0, if the underlying kernel supports it, libbpf uses 'bpf link'
to manage the programs on the interfaces of the xsks. This has two
repercussions for the PMD.

1. In the case where the PMD asks libbpf to load the default XDP program,
the PMD no longer needs to remove it on teardown. This is because bpf link
handles the unloading under the hood.
2. In the case where the PMD loads a custom program, libbpf expects this
program to be linked via bpf link prior to creating the socket.

This patch introduces probes for the libbpf version and kernel support
for bpf link and orchestrates the loading and unloading of
programs according to the capabilities of the kernel and libbpf. The
libbpf version is checked with meson and pkg-config. The probe for
kernel support mirrors how it is implemented in libbpf. A bpf_link is
created and looked up on loopback device. If successful, bpf_link will be
used for the AF_XDP netdev.

Signed-off-by: Ciara Loftus 
Signed-off-by: Maciej Fijalkowski 
---
 drivers/net/af_xdp/compat.h | 120 
 drivers/net/af_xdp/meson.build  |   7 ++
 drivers/net/af_xdp/rte_eth_af_xdp.c |  13 +--
 3 files changed, 135 insertions(+), 5 deletions(-)

diff --git a/drivers/net/af_xdp/compat.h b/drivers/net/af_xdp/compat.h
index 3880dc7dd7..1243de436c 100644
--- a/drivers/net/af_xdp/compat.h
+++ b/drivers/net/af_xdp/compat.h
@@ -2,9 +2,11 @@
  * Copyright(c) 2020 Intel Corporation.
  */
 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #if KERNEL_VERSION(5, 10, 0) <= LINUX_VERSION_CODE && \
defined(RTE_LIBRTE_AF_XDP_PMD_SHARED_UMEM)
@@ -54,3 +56,121 @@ tx_syscall_needed(struct xsk_ring_prod *q __rte_unused)
return 1;
 }
 #endif
+
+#ifdef RTE_LIBRTE_AF_XDP_PMD_BPF_LINK
+static int link_lookup(int ifindex, int *link_fd)
+{
+   struct bpf_link_info link_info;
+   __u32 link_len;
+   __u32 id = 0;
+   int err;
+   int fd;
+
+   while (true) {
+   err = bpf_link_get_next_id(id, &id);
+   if (err) {
+   if (errno == ENOENT) {
+   err = 0;
+   break;
+   }
+   break;
+   }
+
+   fd = bpf_link_get_fd_by_id(id);
+   if (fd < 0) {
+   if (errno == ENOENT)
+   continue;
+   err = -errno;
+   break;
+   }
+
+   link_len = sizeof(struct bpf_link_info);
+   memset(&link_info, 0, link_len);
+   err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len);
+   if (err) {
+   close(fd);
+   break;
+   }
+   if (link_info.type == BPF_LINK_TYPE_XDP) {
+   if ((int)link_info.xdp.ifindex == ifindex) {
+   *link_fd = fd;
+   break;
+   }
+   }
+   close(fd);
+   }
+
+   return err;
+}
+
+static bool probe_bpf_link(void)
+{
+   DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
+   .flags = XDP_FLAGS_SKB_MODE);
+   struct bpf_load_program_attr prog_attr;
+   struct bpf_insn insns[2] = {
+   BPF_MOV64_IMM(BPF_REG_0, XDP_PASS),
+   BPF_EXIT_INSN()
+   };
+   int prog_fd, link_fd = -1;
+   int ifindex_lo = 1;
+   bool ret = false;
+   int err;
+
+   err = link_lookup(ifindex_lo, &link_fd);
+   if (err)
+   return ret;
+
+   if (link_fd >= 0)
+   return true;
+
+   memset(&prog_attr, 0, sizeof(prog_attr));
+   prog_attr.prog_type = BPF_PROG_TYPE_XDP;
+   prog_attr.insns = insns;
+   prog_attr.insns_cnt = RTE_DIM(insns);
+   prog_attr.license = "GPL";
+
+   prog_fd = bpf_load_program_xattr(&prog_attr, NULL, 0);
+   if (prog_fd < 0)
+   return ret;
+
+   link_fd = bpf_link_create(prog_fd, ifindex_lo, BPF_XDP, &opts);
+   close(prog_fd);
+
+   if (link_fd >= 0) {
+   ret = true;
+   close(link_fd);
+   }
+
+   return ret;
+}
+
+static int link_xdp_program(int if_index, int prog_fd, bool use_bpf_link)
+{
+   DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+   int link_fd, ret = 0;
+
+   if (!use_bpf_link)
+   return bpf_set_link_xdp_fd(if_index, prog_fd,
+  XDP_FLAGS_UPDATE_IF_NOEXIST);
+
+   opts.flags = 0;
+   link_fd = bpf_link_create(prog_fd, if_index, BPF_XDP, &opts);
+   if (link_fd < 0)
+   ret = -1;
+
+   return ret;
+}
+#else
+static bool probe_bpf_link(void)
+{
+   return false;
+}
+
+static int link_xdp_program(int if_index, int prog_fd,
+   bool use_bpf_link __rte_unused)
+{
+   return bpf_set_link_xdp_fd(if_ind

[dpdk-dev] [PATCH v4 0/8] port ioatfwd app to dmadev

2021-10-14 Thread Kevin Laatz
This patchset first adds some additional command line options to the
existing ioatfwd application to enhance usability.

The last 3 patches of this set then port the ioatfwd application to use the
dmadev library APIs instead of the IOAT rawdev APIs. Following the port,
all variables etc are renamed to be more appropriate for using with the
DMAdev library. Lastly, the application itself is renamed to "dmafwd".

Depends-on: series-19594 ("support dmadev")

---
v4:
  - rebase on dmadev lib v26 patchset
v3:
  - add signal-triggered device dump
  - add cmd line option to control stats print frequency
  - documentation updates
  - small miscellaneous changes from review feedback

Kevin Laatz (5):
  examples/ioat: add cmd line option to control stats print interval
  examples/ioat: add signal-triggered device dumps
  examples/ioat: port application to dmadev APIs
  examples/ioat: update naming to match change to dmadev
  examples/ioat: rename application to dmafwd

Konstantin Ananyev (3):
  examples/ioat: always use same lcore for both DMA requests enqueue and
dequeue
  examples/ioat: add cmd line option to control DMA batch size
  examples/ioat: add cmd line option to control max frame size

 .../sample_app_ug/{ioat.rst => dma.rst}   | 149 ++---
 doc/guides/sample_app_ug/index.rst|   2 +-
 doc/guides/sample_app_ug/intro.rst|   4 +-
 examples/{ioat => dma}/Makefile   |   4 +-
 examples/{ioat/ioatfwd.c => dma/dmafwd.c} | 632 ++
 examples/{ioat => dma}/meson.build|  10 +-
 examples/meson.build  |   2 +-
 7 files changed, 427 insertions(+), 376 deletions(-)
 rename doc/guides/sample_app_ug/{ioat.rst => dma.rst} (64%)
 rename examples/{ioat => dma}/Makefile (97%)
 rename examples/{ioat/ioatfwd.c => dma/dmafwd.c} (60%)
 rename examples/{ioat => dma}/meson.build (63%)

-- 
2.30.2



[dpdk-dev] [PATCH v4 1/8] examples/ioat: always use same lcore for both DMA requests enqueue and dequeue

2021-10-14 Thread Kevin Laatz
From: Konstantin Ananyev 

Few changes in ioat sample behaviour:
- Always do SW copy for packet metadata (mbuf fields)
- Always use same lcore for both DMA requests enqueue and dequeue

Main reasons for that:
a) it is safer, as idxd PMD doesn't support MT safe enqueue/dequeue (yet).
b) sort of more apples to apples comparison with sw copy.
c) from my testing things are faster that way.

Documentation updates to reflect these changes are also included.

Signed-off-by: Konstantin Ananyev 
Signed-off-by: Kevin Laatz 
Reviewed-by: Conor Walsh 
---
 doc/guides/sample_app_ug/ioat.rst |  39 +++
 examples/ioat/ioatfwd.c   | 185 --
 2 files changed, 117 insertions(+), 107 deletions(-)

diff --git a/doc/guides/sample_app_ug/ioat.rst 
b/doc/guides/sample_app_ug/ioat.rst
index ee0a627b06..2e9d3d6258 100644
--- a/doc/guides/sample_app_ug/ioat.rst
+++ b/doc/guides/sample_app_ug/ioat.rst
@@ -183,10 +183,8 @@ After that each port application assigns resources needed.
 :end-before: >8 End of assigning each port resources.
 :dedent: 1
 
-Depending on mode set (whether copy should be done by software or by hardware)
-special structures are assigned to each port. If software copy was chosen,
-application have to assign ring structures for packet exchanging between lcores
-assigned to ports.
+Ring structures are assigned for exchanging packets between lcores for both SW
+and HW copy modes.
 
 .. literalinclude:: ../../../examples/ioat/ioatfwd.c
 :language: c
@@ -275,12 +273,8 @@ copying device's buffer using ``ioat_enqueue_packets()`` 
which calls
 buffer the copy operations are started by calling ``rte_ioat_perform_ops()``.
 Function ``rte_ioat_enqueue_copy()`` operates on physical address of
 the packet. Structure ``rte_mbuf`` contains only physical address to
-start of the data buffer (``buf_iova``). Thus the address is adjusted
-by ``addr_offset`` value in order to get the address of ``rearm_data``
-member of ``rte_mbuf``. That way both the packet data and metadata can
-be copied in a single operation. This method can be used because the mbufs
-are direct mbufs allocated by the apps. If another app uses external buffers,
-or indirect mbufs, then multiple copy operations must be used.
+start of the data buffer (``buf_iova``). Thus the ``rte_pktmbuf_iova()`` API is
+used to get the address of the start of the data within the mbuf.
 
 .. literalinclude:: ../../../examples/ioat/ioatfwd.c
 :language: c
@@ -289,12 +283,13 @@ or indirect mbufs, then multiple copy operations must be 
used.
 :dedent: 0
 
 
-All completed copies are processed by ``ioat_tx_port()`` function. When using
-hardware copy mode the function invokes ``rte_ioat_completed_ops()``
-on each assigned IOAT channel to gather copied packets. If software copy
-mode is used the function dequeues copied packets from the rte_ring. Then each
-packet MAC address is changed if it was enabled. After that copies are sent
-in burst mode using `` rte_eth_tx_burst()``.
+Once the copies have been completed (this includes gathering the completions in
+HW copy mode), the copied packets are enqueued to the ``rx_to_tx_ring``, which
+is used to pass the packets to the TX function.
+
+All completed copies are processed by ``ioat_tx_port()`` function. This 
function
+dequeues copied packets from the ``rx_to_tx_ring``. Then each packet MAC 
address is changed
+if it was enabled. After that copies are sent in burst mode using 
``rte_eth_tx_burst()``.
 
 
 .. literalinclude:: ../../../examples/ioat/ioatfwd.c
@@ -306,11 +301,9 @@ in burst mode using `` rte_eth_tx_burst()``.
 The Packet Copying Functions
 ~~
 
-In order to perform packet copy there is a user-defined function
-``pktmbuf_sw_copy()`` used. It copies a whole packet by copying
-metadata from source packet to new mbuf, and then copying a data
-chunk of source packet. Both memory copies are done using
-``rte_memcpy()``:
+In order to perform SW packet copy, there are user-defined functions to first 
copy
+the packet metadata (``pktmbuf_metadata_copy()``) and then the packet data
+(``pktmbuf_sw_copy()``):
 
 .. literalinclude:: ../../../examples/ioat/ioatfwd.c
 :language: c
@@ -318,8 +311,8 @@ chunk of source packet. Both memory copies are done using
 :end-before: >8 End of perform packet copy there is a user-defined 
function.
 :dedent: 0
 
-The metadata in this example is copied from ``rearm_data`` member of
-``rte_mbuf`` struct up to ``cacheline1``.
+The metadata in this example is copied from ``rx_descriptor_fields1`` marker of
+``rte_mbuf`` struct up to ``buf_len`` member.
 
 In order to understand why software packet copying is done as shown
 above please refer to the "Mbuf Library" section of the
diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index ff36aa7f1e..bf12bb9ba9 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -331,43 +331,36 @@ update_mac_addrs(struct rte_mbuf *m, uint32_t dest_po

[dpdk-dev] [PATCH v4 2/8] examples/ioat: add cmd line option to control DMA batch size

2021-10-14 Thread Kevin Laatz
From: Konstantin Ananyev 

Add a commandline options to control the HW copy batch size in the
application.

Signed-off-by: Konstantin Ananyev 
Signed-off-by: Kevin Laatz 
Reviewed-by: Conor Walsh 
---
 doc/guides/sample_app_ug/ioat.rst |  4 +++-
 examples/ioat/ioatfwd.c   | 40 ---
 2 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/doc/guides/sample_app_ug/ioat.rst 
b/doc/guides/sample_app_ug/ioat.rst
index 2e9d3d6258..404ca2e19a 100644
--- a/doc/guides/sample_app_ug/ioat.rst
+++ b/doc/guides/sample_app_ug/ioat.rst
@@ -46,7 +46,7 @@ The application requires a number of command line options:
 .. code-block:: console
 
 .//examples/dpdk-ioat [EAL options] -- [-p MASK] [-q NQ] [-s 
RS] [-c ]
-[--[no-]mac-updating]
+[--[no-]mac-updating] [-b BS]
 
 where,
 
@@ -64,6 +64,8 @@ where,
 *   --[no-]mac-updating: Whether MAC address of packets should be changed
 or not (default is mac-updating)
 
+*   b BS: set the DMA batch size
+
 The application can be launched in various configurations depending on
 provided parameters. The app can use up to 2 lcores: one of them receives
 incoming traffic and makes a copy of each packet. The second lcore then
diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index bf12bb9ba9..e73d79271b 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -24,6 +24,7 @@
 #define CMD_LINE_OPT_NB_QUEUE "nb-queue"
 #define CMD_LINE_OPT_COPY_TYPE "copy-type"
 #define CMD_LINE_OPT_RING_SIZE "ring-size"
+#define CMD_LINE_OPT_BATCH_SIZE "dma-batch-size"
 
 /* configurable number of RX/TX ring descriptors */
 #define RX_DEFAULT_RINGSIZE 1024
@@ -102,6 +103,8 @@ static uint16_t nb_txd = TX_DEFAULT_RINGSIZE;
 
 static volatile bool force_quit;
 
+static uint32_t ioat_batch_sz = MAX_PKT_BURST;
+
 /* ethernet addresses of ports */
 static struct rte_ether_addr ioat_ports_eth_addr[RTE_MAX_ETHPORTS];
 
@@ -374,15 +377,25 @@ ioat_enqueue_packets(struct rte_mbuf *pkts[], struct 
rte_mbuf *pkts_copy[],
 
 static inline uint32_t
 ioat_enqueue(struct rte_mbuf *pkts[], struct rte_mbuf *pkts_copy[],
-   uint32_t num, uint16_t dev_id)
+   uint32_t num, uint32_t step, uint16_t dev_id)
 {
-   uint32_t n;
+   uint32_t i, k, m, n;
+
+   k = 0;
+   for (i = 0; i < num; i += m) {
+
+   m = RTE_MIN(step, num - i);
+   n = ioat_enqueue_packets(pkts + i, pkts_copy + i, m, dev_id);
+   k += n;
+   if (n > 0)
+   rte_ioat_perform_ops(dev_id);
 
-   n = ioat_enqueue_packets(pkts, pkts_copy, num, dev_id);
-   if (n > 0)
-   rte_ioat_perform_ops(dev_id);
+   /* don't try to enqueue more if HW queue is full */
+   if (n != m)
+   break;
+   }
 
-   return n;
+   return k;
 }
 
 static inline uint32_t
@@ -439,7 +452,7 @@ ioat_rx_port(struct rxtx_port_config *rx_config)
 
/* enqueue packets for  hardware copy */
nb_enq = ioat_enqueue(pkts_burst, pkts_burst_copy,
-   nb_rx, rx_config->ioat_ids[i]);
+   nb_rx, ioat_batch_sz, rx_config->ioat_ids[i]);
 
/* free any not enqueued packets. */
rte_mempool_put_bulk(ioat_pktmbuf_pool,
@@ -590,6 +603,7 @@ static void
 ioat_usage(const char *prgname)
 {
printf("%s [EAL options] -- -p PORTMASK [-q NQ]\n"
+   "  -b --dma-batch-size: number of requests per DMA batch\n"
"  -p --portmask: hexadecimal bitmask of ports to configure\n"
"  -q NQ: number of RX queues per port (default is 1)\n"
"  --[no-]mac-updating: Enable or disable MAC addresses 
updating (enabled by default)\n"
@@ -631,9 +645,10 @@ static int
 ioat_parse_args(int argc, char **argv, unsigned int nb_ports)
 {
static const char short_options[] =
+   "b:"  /* dma batch size */
+   "c:"  /* copy type (sw|hw) */
"p:"  /* portmask */
"q:"  /* number of RX queues per port */
-   "c:"  /* copy type (sw|hw) */
"s:"  /* ring size */
;
 
@@ -644,6 +659,7 @@ ioat_parse_args(int argc, char **argv, unsigned int 
nb_ports)
{CMD_LINE_OPT_NB_QUEUE, required_argument, NULL, 'q'},
{CMD_LINE_OPT_COPY_TYPE, required_argument, NULL, 'c'},
{CMD_LINE_OPT_RING_SIZE, required_argument, NULL, 's'},
+   {CMD_LINE_OPT_BATCH_SIZE, required_argument, NULL, 'b'},
{NULL, 0, 0, 0}
};
 
@@ -660,6 +676,14 @@ ioat_parse_args(int argc, char **argv, unsigned int 
nb_ports)
lgopts, &option_index)) != EOF) {
 
switch (opt) {
+   case 'b':
+   ioat_batch_sz = atoi(optarg);
+   if (ioat_batch_sz > MAX_PKT_BUR

[dpdk-dev] [PATCH v4 3/8] examples/ioat: add cmd line option to control max frame size

2021-10-14 Thread Kevin Laatz
From: Konstantin Ananyev 

Add command line option for setting the max frame size.

Signed-off-by: Konstantin Ananyev 
Signed-off-by: Kevin Laatz 
Reviewed-by: Conor Walsh 
---
 doc/guides/sample_app_ug/ioat.rst |  4 +++-
 examples/ioat/ioatfwd.c   | 25 +++--
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/doc/guides/sample_app_ug/ioat.rst 
b/doc/guides/sample_app_ug/ioat.rst
index 404ca2e19a..127129dd4b 100644
--- a/doc/guides/sample_app_ug/ioat.rst
+++ b/doc/guides/sample_app_ug/ioat.rst
@@ -46,7 +46,7 @@ The application requires a number of command line options:
 .. code-block:: console
 
 .//examples/dpdk-ioat [EAL options] -- [-p MASK] [-q NQ] [-s 
RS] [-c ]
-[--[no-]mac-updating] [-b BS]
+[--[no-]mac-updating] [-b BS] [-f FS]
 
 where,
 
@@ -66,6 +66,8 @@ where,
 
 *   b BS: set the DMA batch size
 
+*   f FS: set the max frame size
+
 The application can be launched in various configurations depending on
 provided parameters. The app can use up to 2 lcores: one of them receives
 incoming traffic and makes a copy of each packet. The second lcore then
diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index e73d79271b..04ed175432 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -25,6 +25,7 @@
 #define CMD_LINE_OPT_COPY_TYPE "copy-type"
 #define CMD_LINE_OPT_RING_SIZE "ring-size"
 #define CMD_LINE_OPT_BATCH_SIZE "dma-batch-size"
+#define CMD_LINE_OPT_FRAME_SIZE "max-frame-size"
 
 /* configurable number of RX/TX ring descriptors */
 #define RX_DEFAULT_RINGSIZE 1024
@@ -104,6 +105,7 @@ static uint16_t nb_txd = TX_DEFAULT_RINGSIZE;
 static volatile bool force_quit;
 
 static uint32_t ioat_batch_sz = MAX_PKT_BURST;
+static uint32_t max_frame_size = RTE_ETHER_MAX_LEN;
 
 /* ethernet addresses of ports */
 static struct rte_ether_addr ioat_ports_eth_addr[RTE_MAX_ETHPORTS];
@@ -604,6 +606,7 @@ ioat_usage(const char *prgname)
 {
printf("%s [EAL options] -- -p PORTMASK [-q NQ]\n"
"  -b --dma-batch-size: number of requests per DMA batch\n"
+   "  -f --max-frame-size: max frame size\n"
"  -p --portmask: hexadecimal bitmask of ports to configure\n"
"  -q NQ: number of RX queues per port (default is 1)\n"
"  --[no-]mac-updating: Enable or disable MAC addresses 
updating (enabled by default)\n"
@@ -647,6 +650,7 @@ ioat_parse_args(int argc, char **argv, unsigned int 
nb_ports)
static const char short_options[] =
"b:"  /* dma batch size */
"c:"  /* copy type (sw|hw) */
+   "f:"  /* max frame size */
"p:"  /* portmask */
"q:"  /* number of RX queues per port */
"s:"  /* ring size */
@@ -660,6 +664,7 @@ ioat_parse_args(int argc, char **argv, unsigned int 
nb_ports)
{CMD_LINE_OPT_COPY_TYPE, required_argument, NULL, 'c'},
{CMD_LINE_OPT_RING_SIZE, required_argument, NULL, 's'},
{CMD_LINE_OPT_BATCH_SIZE, required_argument, NULL, 'b'},
+   {CMD_LINE_OPT_FRAME_SIZE, required_argument, NULL, 'f'},
{NULL, 0, 0, 0}
};
 
@@ -684,6 +689,15 @@ ioat_parse_args(int argc, char **argv, unsigned int 
nb_ports)
return -1;
}
break;
+   case 'f':
+   max_frame_size = atoi(optarg);
+   if (max_frame_size > RTE_ETHER_MAX_JUMBO_FRAME_LEN) {
+   printf("Invalid max frame size, %s.\n", optarg);
+   ioat_usage(prgname);
+   return -1;
+   }
+   break;
+
/* portmask */
case 'p':
ioat_enabled_port_mask = ioat_parse_portmask(optarg);
@@ -880,6 +894,11 @@ port_init(uint16_t portid, struct rte_mempool *mbuf_pool, 
uint16_t nb_queues)
struct rte_eth_dev_info dev_info;
int ret, i;
 
+   if (max_frame_size > local_port_conf.rxmode.max_rx_pkt_len) {
+   local_port_conf.rxmode.max_rx_pkt_len = max_frame_size;
+   local_port_conf.rxmode.offloads |= DEV_RX_OFFLOAD_JUMBO_FRAME;
+   }
+
/* Skip ports that are not enabled */
if ((ioat_enabled_port_mask & (1 << portid)) == 0) {
printf("Skipping disabled port %u\n", portid);
@@ -990,6 +1009,7 @@ main(int argc, char **argv)
uint16_t nb_ports, portid;
uint32_t i;
unsigned int nb_mbufs;
+   size_t sz;
 
/* Init EAL. 8< */
ret = rte_eal_init(argc, argv);
@@ -1019,9 +1039,10 @@ main(int argc, char **argv)
MIN_POOL_SIZE);
 
/* Create the mbuf pool */
+   sz = max_frame_size + RTE_PKTMBUF_HEADROOM;
+   sz = RTE_MAX(sz, (size_t)RTE_MBUF_DEFAULT_BUF_SIZE);
ioat_pktmbuf_pool = rte_pktmbuf_pool_c

[dpdk-dev] [PATCH v4 5/8] examples/ioat: add signal-triggered device dumps

2021-10-14 Thread Kevin Laatz
Enable dumping device info via the signal handler. With this change, when a
SIGUSR1 is issued, the application will print a dump of all devices being
used by the application.

Signed-off-by: Kevin Laatz 
Reviewed-by: Conor Walsh 
---
 examples/ioat/ioatfwd.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index 8c4920b798..8bb69c1c14 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -1007,6 +1007,20 @@ port_init(uint16_t portid, struct rte_mempool 
*mbuf_pool, uint16_t nb_queues)
cfg.ports[cfg.nb_ports++].nb_queues = nb_queues;
 }
 
+/* Get a device dump for each device being used by the application */
+static void
+rawdev_dump(void)
+{
+   uint32_t i, j;
+
+   if (copy_mode != COPY_MODE_IOAT_NUM)
+   return;
+
+   for (i = 0; i < cfg.nb_ports; i++)
+   for (j = 0; j < cfg.ports[i].nb_queues; j++)
+   rte_rawdev_dump(cfg.ports[i].ioat_ids[j], stdout);
+}
+
 static void
 signal_handler(int signum)
 {
@@ -1014,6 +1028,8 @@ signal_handler(int signum)
printf("\n\nSignal %d received, preparing to exit...\n",
signum);
force_quit = true;
+   } else if (signum == SIGUSR1) {
+   rawdev_dump();
}
 }
 
@@ -1037,6 +1053,7 @@ main(int argc, char **argv)
force_quit = false;
signal(SIGINT, signal_handler);
signal(SIGTERM, signal_handler);
+   signal(SIGUSR1, signal_handler);
 
nb_ports = rte_eth_dev_count_avail();
if (nb_ports == 0)
-- 
2.30.2



[dpdk-dev] [PATCH v4 4/8] examples/ioat: add cmd line option to control stats print interval

2021-10-14 Thread Kevin Laatz
Add a command line option to control the interval between stats prints.

Signed-off-by: Kevin Laatz 
Reviewed-by: Conor Walsh 
---
 doc/guides/sample_app_ug/ioat.rst |  4 +++-
 examples/ioat/ioatfwd.c   | 31 +++
 2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/doc/guides/sample_app_ug/ioat.rst 
b/doc/guides/sample_app_ug/ioat.rst
index 127129dd4b..1edad3f9ac 100644
--- a/doc/guides/sample_app_ug/ioat.rst
+++ b/doc/guides/sample_app_ug/ioat.rst
@@ -46,7 +46,7 @@ The application requires a number of command line options:
 .. code-block:: console
 
 .//examples/dpdk-ioat [EAL options] -- [-p MASK] [-q NQ] [-s 
RS] [-c ]
-[--[no-]mac-updating] [-b BS] [-f FS]
+[--[no-]mac-updating] [-b BS] [-f FS] [-i SI]
 
 where,
 
@@ -68,6 +68,8 @@ where,
 
 *   f FS: set the max frame size
 
+*   i SI: set the interval, in second, between statistics prints (default is 1)
+
 The application can be launched in various configurations depending on
 provided parameters. The app can use up to 2 lcores: one of them receives
 incoming traffic and makes a copy of each packet. The second lcore then
diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index 04ed175432..8c4920b798 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -26,6 +26,7 @@
 #define CMD_LINE_OPT_RING_SIZE "ring-size"
 #define CMD_LINE_OPT_BATCH_SIZE "dma-batch-size"
 #define CMD_LINE_OPT_FRAME_SIZE "max-frame-size"
+#define CMD_LINE_OPT_STATS_INTERVAL "stats-interval"
 
 /* configurable number of RX/TX ring descriptors */
 #define RX_DEFAULT_RINGSIZE 1024
@@ -95,6 +96,9 @@ static copy_mode_t copy_mode = COPY_MODE_IOAT_NUM;
  */
 static unsigned short ring_size = 2048;
 
+/* interval, in seconds, between stats prints */
+static unsigned short stats_interval = 1;
+
 /* global transmission config */
 struct rxtx_transmission_config cfg;
 
@@ -152,15 +156,15 @@ print_total_stats(struct total_statistics *ts)
"\nTotal packets Tx: %24"PRIu64" [pps]"
"\nTotal packets Rx: %24"PRIu64" [pps]"
"\nTotal packets dropped: %19"PRIu64" [pps]",
-   ts->total_packets_tx,
-   ts->total_packets_rx,
-   ts->total_packets_dropped);
+   ts->total_packets_tx / stats_interval,
+   ts->total_packets_rx / stats_interval,
+   ts->total_packets_dropped / stats_interval);
 
if (copy_mode == COPY_MODE_IOAT_NUM) {
printf("\nTotal IOAT successful enqueues: %8"PRIu64" [enq/s]"
"\nTotal IOAT failed enqueues: %12"PRIu64" [enq/s]",
-   ts->total_successful_enqueues,
-   ts->total_failed_enqueues);
+   ts->total_successful_enqueues / stats_interval,
+   ts->total_failed_enqueues / stats_interval);
}
 
printf("\n\n");
@@ -248,10 +252,10 @@ print_stats(char *prgname)
memset(&ts, 0, sizeof(struct total_statistics));
 
while (!force_quit) {
-   /* Sleep for 1 second each round - init sleep allows reading
+   /* Sleep for "stats_interval" seconds each round - init sleep 
allows reading
 * messages from app startup.
 */
-   sleep(1);
+   sleep(stats_interval);
 
/* Clear screen and move to top left */
printf("%s%s", clr, topLeft);
@@ -614,7 +618,8 @@ ioat_usage(const char *prgname)
"   - The source MAC address is replaced by the TX port MAC 
address\n"
"   - The destination MAC address is replaced by 
02:00:00:00:00:TX_PORT_ID\n"
"  -c --copy-type CT: type of copy: sw|hw\n"
-   "  -s --ring-size RS: size of IOAT rawdev ring for hardware 
copy mode or rte_ring for software copy mode\n",
+   "  -s --ring-size RS: size of IOAT rawdev ring for hardware 
copy mode or rte_ring for software copy mode\n"
+   "  -i --stats-interval SI: interval, in seconds, between stats 
prints (default is 1)\n",
prgname);
 }
 
@@ -654,6 +659,7 @@ ioat_parse_args(int argc, char **argv, unsigned int 
nb_ports)
"p:"  /* portmask */
"q:"  /* number of RX queues per port */
"s:"  /* ring size */
+   "i:"  /* interval, in seconds, between stats prints */
;
 
static const struct option lgopts[] = {
@@ -665,6 +671,7 @@ ioat_parse_args(int argc, char **argv, unsigned int 
nb_ports)
{CMD_LINE_OPT_RING_SIZE, required_argument, NULL, 's'},
{CMD_LINE_OPT_BATCH_SIZE, required_argument, NULL, 'b'},
{CMD_LINE_OPT_FRAME_SIZE, required_argument, NULL, 'f'},
+   {CMD_LINE_OPT_STATS_INTERVAL, required_argument, NULL, 'i'},
{NULL, 0, 0, 0}
  

[dpdk-dev] [PATCH v4 6/8] examples/ioat: port application to dmadev APIs

2021-10-14 Thread Kevin Laatz
The dmadev library abstraction allows applications to use the same APIs for
all DMA device drivers in DPDK. This patch updates the ioatfwd application
to make use of the new dmadev APIs, in turn making it a generic application
which can be used with any of the DMA device drivers.

Signed-off-by: Kevin Laatz 
Reviewed-by: Conor Walsh 

---
v4: update macro name after rebasing
---
 examples/ioat/ioatfwd.c   | 247 --
 examples/ioat/meson.build |   8 +-
 2 files changed, 107 insertions(+), 148 deletions(-)

diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index 8bb69c1c14..5f3564af30 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2019 Intel Corporation
+ * Copyright(c) 2019-2021 Intel Corporation
  */
 
 #include 
@@ -10,11 +10,10 @@
 
 #include 
 #include 
-#include 
-#include 
+#include 
 
 /* size of ring used for software copying between rx and tx. */
-#define RTE_LOGTYPE_IOAT RTE_LOGTYPE_USER1
+#define RTE_LOGTYPE_DMA RTE_LOGTYPE_USER1
 #define MAX_PKT_BURST 32
 #define MEMPOOL_CACHE_SIZE 512
 #define MIN_POOL_SIZE 65536U
@@ -41,8 +40,8 @@ struct rxtx_port_config {
uint16_t nb_queues;
/* for software copy mode */
struct rte_ring *rx_to_tx_ring;
-   /* for IOAT rawdev copy mode */
-   uint16_t ioat_ids[MAX_RX_QUEUES_COUNT];
+   /* for dmadev HW copy mode */
+   uint16_t dmadev_ids[MAX_RX_QUEUES_COUNT];
 };
 
 /* Configuring ports and number of assigned lcores in struct. 8< */
@@ -61,13 +60,13 @@ struct ioat_port_statistics {
uint64_t copy_dropped[RTE_MAX_ETHPORTS];
 };
 struct ioat_port_statistics port_statistics;
-
 struct total_statistics {
uint64_t total_packets_dropped;
uint64_t total_packets_tx;
uint64_t total_packets_rx;
-   uint64_t total_successful_enqueues;
-   uint64_t total_failed_enqueues;
+   uint64_t total_submitted;
+   uint64_t total_completed;
+   uint64_t total_failed;
 };
 
 typedef enum copy_mode_t {
@@ -98,6 +97,15 @@ static unsigned short ring_size = 2048;
 
 /* interval, in seconds, between stats prints */
 static unsigned short stats_interval = 1;
+/* global mbuf arrays for tracking DMA bufs */
+#define MBUF_RING_SIZE 2048
+#define MBUF_RING_MASK (MBUF_RING_SIZE - 1)
+struct dma_bufs {
+   struct rte_mbuf *bufs[MBUF_RING_SIZE];
+   struct rte_mbuf *copies[MBUF_RING_SIZE];
+   uint16_t sent;
+};
+static struct dma_bufs dma_bufs[RTE_DMADEV_DEFAULT_MAX];
 
 /* global transmission config */
 struct rxtx_transmission_config cfg;
@@ -135,36 +143,32 @@ print_port_stats(uint16_t port_id)
 
 /* Print out statistics for one IOAT rawdev device. */
 static void
-print_rawdev_stats(uint32_t dev_id, uint64_t *xstats,
-   unsigned int *ids_xstats, uint16_t nb_xstats,
-   struct rte_rawdev_xstats_name *names_xstats)
+print_dmadev_stats(uint32_t dev_id, struct rte_dma_stats stats)
 {
-   uint16_t i;
-
-   printf("\nIOAT channel %u", dev_id);
-   for (i = 0; i < nb_xstats; i++)
-   printf("\n\t %s: %*"PRIu64,
-   names_xstats[ids_xstats[i]].name,
-   (int)(37 - strlen(names_xstats[ids_xstats[i]].name)),
-   xstats[i]);
+   printf("\nDMA channel %u", dev_id);
+   printf("\n\t Total submitted ops: %"PRIu64"", stats.submitted);
+   printf("\n\t Total completed ops: %"PRIu64"", stats.completed);
+   printf("\n\t Total failed ops: %"PRIu64"", stats.errors);
 }
 
 static void
 print_total_stats(struct total_statistics *ts)
 {
printf("\nAggregate statistics ==="
-   "\nTotal packets Tx: %24"PRIu64" [pps]"
-   "\nTotal packets Rx: %24"PRIu64" [pps]"
-   "\nTotal packets dropped: %19"PRIu64" [pps]",
+   "\nTotal packets Tx: %22"PRIu64" [pkt/s]"
+   "\nTotal packets Rx: %22"PRIu64" [pkt/s]"
+   "\nTotal packets dropped: %17"PRIu64" [pkt/s]",
ts->total_packets_tx / stats_interval,
ts->total_packets_rx / stats_interval,
ts->total_packets_dropped / stats_interval);
 
if (copy_mode == COPY_MODE_IOAT_NUM) {
-   printf("\nTotal IOAT successful enqueues: %8"PRIu64" [enq/s]"
-   "\nTotal IOAT failed enqueues: %12"PRIu64" [enq/s]",
-   ts->total_successful_enqueues / stats_interval,
-   ts->total_failed_enqueues / stats_interval);
+   printf("\nTotal submitted ops: %19"PRIu64" [ops/s]"
+   "\nTotal completed ops: %19"PRIu64" [ops/s]"
+   "\nTotal failed ops: %22"PRIu64" [ops/s]",
+   ts->total_submitted / stats_interval,
+   ts->total_completed / stats_interval,
+   ts->total_failed / stats_interval);
}
 
printf

[dpdk-dev] [PATCH v4 7/8] examples/ioat: update naming to match change to dmadev

2021-10-14 Thread Kevin Laatz
Existing functions, structures, defines etc need to be updated to reflect
the change to using the dmadev APIs.

Signed-off-by: Kevin Laatz 
Reviewed-by: Conor Walsh 
---
 examples/ioat/ioatfwd.c | 189 
 1 file changed, 94 insertions(+), 95 deletions(-)

diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index 5f3564af30..77052ef892 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -53,13 +53,13 @@ struct rxtx_transmission_config {
 /* >8 End of configuration of ports and number of assigned lcores. */
 
 /* per-port statistics struct */
-struct ioat_port_statistics {
+struct dma_port_statistics {
uint64_t rx[RTE_MAX_ETHPORTS];
uint64_t tx[RTE_MAX_ETHPORTS];
uint64_t tx_dropped[RTE_MAX_ETHPORTS];
uint64_t copy_dropped[RTE_MAX_ETHPORTS];
 };
-struct ioat_port_statistics port_statistics;
+struct dma_port_statistics port_statistics;
 struct total_statistics {
uint64_t total_packets_dropped;
uint64_t total_packets_tx;
@@ -72,14 +72,14 @@ struct total_statistics {
 typedef enum copy_mode_t {
 #define COPY_MODE_SW "sw"
COPY_MODE_SW_NUM,
-#define COPY_MODE_IOAT "hw"
-   COPY_MODE_IOAT_NUM,
+#define COPY_MODE_DMA "hw"
+   COPY_MODE_DMA_NUM,
COPY_MODE_INVALID_NUM,
COPY_MODE_SIZE_NUM = COPY_MODE_INVALID_NUM
 } copy_mode_t;
 
 /* mask of enabled ports */
-static uint32_t ioat_enabled_port_mask;
+static uint32_t dma_enabled_port_mask;
 
 /* number of RX queues per port */
 static uint16_t nb_queues = 1;
@@ -88,9 +88,9 @@ static uint16_t nb_queues = 1;
 static int mac_updating = 1;
 
 /* hardare copy mode enabled by default. */
-static copy_mode_t copy_mode = COPY_MODE_IOAT_NUM;
+static copy_mode_t copy_mode = COPY_MODE_DMA_NUM;
 
-/* size of IOAT rawdev ring for hardware copy mode or
+/* size of descriptor ring for hardware copy mode or
  * rte_ring for software copy mode
  */
 static unsigned short ring_size = 2048;
@@ -116,14 +116,14 @@ static uint16_t nb_txd = TX_DEFAULT_RINGSIZE;
 
 static volatile bool force_quit;
 
-static uint32_t ioat_batch_sz = MAX_PKT_BURST;
+static uint32_t dma_batch_sz = MAX_PKT_BURST;
 static uint32_t max_frame_size = RTE_ETHER_MAX_LEN;
 
 /* ethernet addresses of ports */
-static struct rte_ether_addr ioat_ports_eth_addr[RTE_MAX_ETHPORTS];
+static struct rte_ether_addr dma_ports_eth_addr[RTE_MAX_ETHPORTS];
 
 static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
-struct rte_mempool *ioat_pktmbuf_pool;
+struct rte_mempool *dma_pktmbuf_pool;
 
 /* Print out statistics for one port. */
 static void
@@ -141,7 +141,7 @@ print_port_stats(uint16_t port_id)
port_statistics.copy_dropped[port_id]);
 }
 
-/* Print out statistics for one IOAT rawdev device. */
+/* Print out statistics for one dmadev device. */
 static void
 print_dmadev_stats(uint32_t dev_id, struct rte_dma_stats stats)
 {
@@ -162,7 +162,7 @@ print_total_stats(struct total_statistics *ts)
ts->total_packets_rx / stats_interval,
ts->total_packets_dropped / stats_interval);
 
-   if (copy_mode == COPY_MODE_IOAT_NUM) {
+   if (copy_mode == COPY_MODE_DMA_NUM) {
printf("\nTotal submitted ops: %19"PRIu64" [ops/s]"
"\nTotal completed ops: %19"PRIu64" [ops/s]"
"\nTotal failed ops: %22"PRIu64" [ops/s]",
@@ -196,7 +196,7 @@ print_stats(char *prgname)
status_strlen += snprintf(status_string + status_strlen,
sizeof(status_string) - status_strlen,
"Copy Mode = %s,\n", copy_mode == COPY_MODE_SW_NUM ?
-   COPY_MODE_SW : COPY_MODE_IOAT);
+   COPY_MODE_SW : COPY_MODE_DMA);
status_strlen += snprintf(status_string + status_strlen,
sizeof(status_string) - status_strlen,
"Updating MAC = %s, ", mac_updating ?
@@ -235,7 +235,7 @@ print_stats(char *prgname)
delta_ts.total_packets_rx +=
port_statistics.rx[port_id];
 
-   if (copy_mode == COPY_MODE_IOAT_NUM) {
+   if (copy_mode == COPY_MODE_DMA_NUM) {
uint32_t j;
 
for (j = 0; j < cfg.ports[i].nb_queues; j++) {
@@ -286,7 +286,7 @@ update_mac_addrs(struct rte_mbuf *m, uint32_t dest_portid)
*((uint64_t *)tmp) = 0x0002 + ((uint64_t)dest_portid << 40);
 
/* src addr */
-   rte_ether_addr_copy(&ioat_ports_eth_addr[dest_portid], ð->src_addr);
+   rte_ether_addr_copy(&dma_ports_eth_addr[dest_portid], ð->src_addr);
 }
 
 /* Perform packet copy there is a user-defined function. 8< */
@@ -309,7 +309,7 @@ pktmbuf_sw_copy(struct rte_mbuf *src, struct rte_mbuf *dst)
 /* >8 End of perform packet copy there is a user-defined function. */
 
 static uint32_t
-ioat_enqueue_packets(struct rte_mbuf *pkts[], struct rte_mbuf *pkts_copy[],
+dma_enqueue_packets(stru

[dpdk-dev] [PATCH v4 8/8] examples/ioat: rename application to dmafwd

2021-10-14 Thread Kevin Laatz
Since the APIs have been updated from rawdev to dmadev, the application
should also be renamed to match. This patch also includes the documentation
updates for the renaming.

Signed-off-by: Kevin Laatz 
Reviewed-by: Conor Walsh 
---
 .../sample_app_ug/{ioat.rst => dma.rst}   | 102 +-
 doc/guides/sample_app_ug/index.rst|   2 +-
 doc/guides/sample_app_ug/intro.rst|   4 +-
 examples/{ioat => dma}/Makefile   |   4 +-
 examples/{ioat/ioatfwd.c => dma/dmafwd.c} |   0
 examples/{ioat => dma}/meson.build|   2 +-
 examples/meson.build  |   2 +-
 7 files changed, 58 insertions(+), 58 deletions(-)
 rename doc/guides/sample_app_ug/{ioat.rst => dma.rst} (75%)
 rename examples/{ioat => dma}/Makefile (97%)
 rename examples/{ioat/ioatfwd.c => dma/dmafwd.c} (100%)
 rename examples/{ioat => dma}/meson.build (94%)

diff --git a/doc/guides/sample_app_ug/ioat.rst 
b/doc/guides/sample_app_ug/dma.rst
similarity index 75%
rename from doc/guides/sample_app_ug/ioat.rst
rename to doc/guides/sample_app_ug/dma.rst
index 1edad3f9ac..4b8e607774 100644
--- a/doc/guides/sample_app_ug/ioat.rst
+++ b/doc/guides/sample_app_ug/dma.rst
@@ -1,17 +1,17 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
-Copyright(c) 2019 Intel Corporation.
+Copyright(c) 2019-2021 Intel Corporation.
 
 .. include:: 
 
-Packet copying using Intel\ |reg| QuickData Technology
-==
+Packet copying using DMAdev library
+===
 
 Overview
 
 
 This sample is intended as a demonstration of the basic components of a DPDK
-forwarding application and example of how to use IOAT driver API to make
-packets copies.
+forwarding application and example of how to use the DMAdev API to make a 
packet
+copy application.
 
 Also while forwarding, the MAC addresses are affected as follows:
 
@@ -29,7 +29,7 @@ Compiling the Application
 
 To compile the sample application see :doc:`compiling`.
 
-The application is located in the ``ioat`` sub-directory.
+The application is located in the ``dma`` sub-directory.
 
 
 Running the Application
@@ -38,8 +38,8 @@ Running the Application
 In order to run the hardware copy application, the copying device
 needs to be bound to user-space IO driver.
 
-Refer to the "IOAT Rawdev Driver" chapter in the "Rawdev Drivers" document
-for information on using the driver.
+Refer to the "DMAdev library" chapter in the "Programmers guide" for 
information
+on using the library.
 
 The application requires a number of command line options:
 
@@ -52,13 +52,13 @@ where,
 
 *   p MASK: A hexadecimal bitmask of the ports to configure (default is all)
 
-*   q NQ: Number of Rx queues used per port equivalent to CBDMA channels
+*   q NQ: Number of Rx queues used per port equivalent to DMA channels
 per port (default is 1)
 
 *   c CT: Performed packet copy type: software (sw) or hardware using
 DMA (hw) (default is hw)
 
-*   s RS: Size of IOAT rawdev ring for hardware copy mode or rte_ring for
+*   s RS: Size of dmadev descriptor ring for hardware copy mode or rte_ring for
 software copy mode (default is 2048)
 
 *   --[no-]mac-updating: Whether MAC address of packets should be changed
@@ -87,7 +87,7 @@ updating issue the command:
 
 .. code-block:: console
 
-$ .//examples/dpdk-ioat -l 0-2 -n 2 -- -p 0x1 --mac-updating -c 
sw
+$ .//examples/dpdk-dma -l 0-2 -n 2 -- -p 0x1 --mac-updating -c 
sw
 
 To run the application in a Linux environment with 2 lcores (the main lcore,
 plus one forwarding core), 2 ports (ports 0 and 1), hardware copying and no MAC
@@ -95,7 +95,7 @@ updating issue the command:
 
 .. code-block:: console
 
-$ .//examples/dpdk-ioat -l 0-1 -n 1 -- -p 0x3 --no-mac-updating 
-c hw
+$ .//examples/dpdk-dma -l 0-1 -n 1 -- -p 0x3 --no-mac-updating 
-c hw
 
 Refer to the *DPDK Getting Started Guide* for general information on
 running applications and the Environment Abstraction Layer (EAL) options.
@@ -120,7 +120,7 @@ The first task is to initialize the Environment Abstraction 
Layer (EAL).
 The ``argc`` and ``argv`` arguments are provided to the ``rte_eal_init()``
 function. The value returned is the number of parsed arguments:
 
-.. literalinclude:: ../../../examples/ioat/ioatfwd.c
+.. literalinclude:: ../../../examples/dma/dmafwd.c
 :language: c
 :start-after: Init EAL. 8<
 :end-before: >8 End of init EAL.
@@ -130,7 +130,7 @@ function. The value returned is the number of parsed 
arguments:
 The ``main()`` also allocates a mempool to hold the mbufs (Message Buffers)
 used by the application:
 
-.. literalinclude:: ../../../examples/ioat/ioatfwd.c
+.. literalinclude:: ../../../examples/dma/dmafwd.c
 :language: c
 :start-after: Allocates mempool to hold the mbufs. 8<
 :end-before: >8 End of allocates mempool to hold the mbufs.
@@ -141,7 +141,7 @@ detail in the "Mbuf Library" section of the *DPDK 
Programmer's Guide

[dpdk-dev] [PATCH] app/test-eventdev: fix terminal colour after control-c exit

2021-10-14 Thread Harry van Haaren
Before this commit, a Control^C exit of the test-eventdev application
would print the worker packet percentages, and leave the terminal with
a green colour despite the colour reset being issued after the newline.
By moving the colour reset command before the \n the issue is fixed.

Fixes: 6b1a14a83a06 ("app/eventdev: add packet distribution logs")

Signed-off-by: Harry van Haaren 

---

Given this is an aesthetic only fix, I feel its not worth backporting.
Cc: pbhagavat...@marvell.com>

---
 app/test-eventdev/test_perf_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/app/test-eventdev/test_perf_common.c 
b/app/test-eventdev/test_perf_common.c
index e0d9f05ecd..a1b8dd72ee 100644
--- a/app/test-eventdev/test_perf_common.c
+++ b/app/test-eventdev/test_perf_common.c
@@ -19,7 +19,7 @@ perf_test_result(struct evt_test *test, struct evt_options 
*opt)
total += t->worker[i].processed_pkts;
for (i = 0; i < t->nb_workers; i++)
printf("Worker %d packets: "CLGRN"%"PRIx64" "CLNRM"percentage:"
-   CLGRN" %3.2f\n"CLNRM, i,
+   CLGRN" %3.2f"CLNRM"\n", i,
t->worker[i].processed_pkts,
(((double)t->worker[i].processed_pkts)/total)
* 100);
-- 
2.30.2



Re: [dpdk-dev] [PATCH 14/32] net/ngbe: support Rx interrupt

2021-10-14 Thread Jiawen Wu
On September 16, 2021 12:54 AM, Ferruh Yigit wrote:
> On 9/8/2021 9:37 AM, Jiawen Wu wrote:
> > Support Rx queue interrupt.
> >
> > Signed-off-by: Jiawen Wu 
> > ---
> >  doc/guides/nics/features/ngbe.ini |  1 +
> >  doc/guides/nics/ngbe.rst  |  1 +
> >  drivers/net/ngbe/ngbe_ethdev.c| 35
> +++
> >  3 files changed, 37 insertions(+)
> >
> > diff --git a/doc/guides/nics/features/ngbe.ini
> > b/doc/guides/nics/features/ngbe.ini
> > index 1006c3935b..d14469eb43 100644
> > --- a/doc/guides/nics/features/ngbe.ini
> > +++ b/doc/guides/nics/features/ngbe.ini
> > @@ -7,6 +7,7 @@
> >  Speed capabilities   = Y
> >  Link status  = Y
> >  Link status event= Y
> > +Rx interrupt = Y
> 
> This also requires configuring Rx interrupts if user 'dev_conf.intr_conf.rxq'
> config requests it.
> 
> Is an application can request and use Rx interrupts with current status of the
> driver? Did you test it?

I can't find the corresponding test case in examples, could you give me a 
suggestion?
I just configured almost the same registers as the kernel driver before.
But now I'll drop this feature first, and wait for a successful test result.






[dpdk-dev] 回复: [PATCH v6 0/6] hide eth dev related structures

2021-10-14 Thread Feifei Wang


> -邮件原件-
> 发件人: dev  代表 Ferruh Yigit
> 发送时间: Thursday, October 14, 2021 4:16 AM
> 收件人: Konstantin Ananyev ;
> dev@dpdk.org; jer...@marvell.com; Ajit Khaparde
> (ajit.khapa...@broadcom.com) ; Raslan
> Darawsheh ; Andrew Rybchenko
> ; Qi Zhang ;
> Honnappa Nagarahalli 
> 抄送: xiaoyun...@intel.com; ano...@marvell.com; jer...@marvell.com;
> ndabilpu...@marvell.com; adwiv...@marvell.com;
> shepard.sie...@atomicrules.com; ed.cz...@atomicrules.com;
> john.mil...@atomicrules.com; irussk...@marvell.com; Ajit Khaparde
> (ajit.khapa...@broadcom.com) ;
> somnath.ko...@broadcom.com; rahul.lakkire...@chelsio.com;
> hemant.agra...@nxp.com; sachin.sax...@oss.nxp.com;
> haiyue.w...@intel.com; johnd...@cisco.com; hyon...@cisco.com;
> qi.z.zh...@intel.com; xiao.w.w...@intel.com; humi...@huawei.com;
> yisen.zhu...@huawei.com; ouli...@huawei.com; beilei.x...@intel.com;
> jingjing...@intel.com; qiming.y...@intel.com; ma...@nvidia.com;
> viachesl...@nvidia.com; sthem...@microsoft.com; lon...@microsoft.com;
> heinrich.k...@corigine.com; kirankum...@marvell.com;
> andrew.rybche...@oktetlabs.ru; mcze...@marvell.com;
> jiawe...@trustnetic.com; jianw...@trustnetic.com;
> maxime.coque...@redhat.com; chenbo@intel.com;
> tho...@monjalon.net; m...@ashroe.eu; jay.jayatheert...@intel.com
> 主题: Re: [dpdk-dev] [PATCH v6 0/6] hide eth dev related structures
> 
> On 10/13/2021 2:36 PM, Konstantin Ananyev wrote:
> > v6 changes:
> > - Update comments (Andrew)
> > - Move callback related variables under corresponding ifdefs (Andrew)
> > - Few nits in rte_eth_macaddrs_get (Andrew)
> > - Rebased on top of next-net tree
> >
> > v5 changes:
> > - Fix spelling (Thomas/David)
> > - Rename internal helper functions (David)
> > - Reorder patches and update commit messages (Thomas)
> > - Update comments (Thomas)
> > - Changed layout in rte_eth_fp_ops, to group functions and
> > related data based on their functionality:
> > first 64B line for Rx, second one for Tx.
> > Didn't observe any real performance difference comparing to
> > original layout. Though decided to keep a new one, as it seems
> > a bit more plausible.
> >
> > v4 changes:
> >   - Fix secondary process attach (Pavan)
> >   - Fix build failure (Ferruh)
> >   - Update lib/ethdev/verion.map (Ferruh)
> > Note that moving newly added symbols from EXPERIMENTAL to DPDK_22
> > section makes checkpatch.sh to complain.
> >
> > v3 changes:
> >   - Changes in public struct naming (Jerin/Haiyue)
> >   - Split patches
> >   - Update docs
> >   - Shamelessly included Andrew's patch:
> >
> https://patches.dpdk.org/project/dpdk/patch/20210928154856.1015020-1-
> andrew.rybche...@oktetlabs.ru/
> > into these series.
> > I have to do similar thing here, so decided to avoid duplicated effort.
> >
> > The aim of these patch series is to make rte_ethdev core data
> > structures (rte_eth_dev, rte_eth_dev_data, rte_eth_rxtx_callback,
> > etc.) internal to DPDK and not visible to the user.
> > That should allow future possible changes to core ethdev related
> > structures to be transparent to the user and help to improve ABI/API
> stability.
> > Note that current ethdev API is preserved, but it is a formal ABI break.
> >
> > The work is based on previous discussions at:
> > https://www.mail-archive.com/dev@dpdk.org/msg211405.html
> > https://www.mail-archive.com/dev@dpdk.org/msg216685.html
> > and consists of the following main points:
> > 1. Copy public 'fast' function pointers (rx_pkt_burst(), etc.) and
> > related data pointer from rte_eth_dev into a separate flat array.
> > We keep it public to still be able to use inline functions for these
> > 'fast' calls (like rte_eth_rx_burst(), etc.) to avoid/minimize slowdown.
> > Note that apart from function pointers itself, each element of this
> > flat array also contains two opaque pointers for each ethdev:
> > 1) a pointer to an array of internal queue data pointers
> > 2)  points to array of queue callback data pointers.
> > Note that exposing this extra information allows us to avoid extra
> > changes inside PMD level, plus should help to avoid possible
> > performance degradation.
> > 2. Change implementation of 'fast' inline ethdev functions
> > (rte_eth_rx_burst(), etc.) to use new public flat array.
> > While it is an ABI breakage, this change is intended to be transparent
> > for both users (no changes in user app is required) and PMD developers
> > (no changes in PMD is required).
> > One extra note - with new implementation RX/TX callback invocation
> > will cost one extra function call with this changes. That might cause
> > some slowdown for code-path with RX/TX callbacks heavily involved.
> > Hope such trade-off is acceptable for the community.
> > 3. Move rte_eth_dev, rte_eth_dev_data, rte_eth_rxtx_callback and related
> > things into internal header: .
> >
> > That approach was selected to:
> >- Avoid(/minimize) possible per

[dpdk-dev] [PATCH v1] net/mlx5: fix RSS expansion for L2/L3 VXLAN

2021-10-14 Thread Lior Margalit
The RSS expansion algorithm is using a graph to find the possible
expansion paths. The current implementation does not differentiate
between standard (L2) VXLAN and L3 VXLAN. As result the flow is expanded
with all possible paths.
For example:
testpmd> flow create... / vxlan / end actions rss level 2 / end
It is currently expanded to the following paths:
ETH IPV4 UDP VXLAN END
ETH IPV4 UDP VXLAN ETH IPV4 END
ETH IPV4 UDP VXLAN ETH IPV6 END
ETH IPV4 UDP VXLAN IPV4 END
ETH IPV4 UDP VXLAN IPV6 END

The fix is to adjust the expansion according to the outer UDP destination
port. In case flow pattern defines a match on the standard udp port, 4789,
or does not define a match on the destination port, which also implies
setting the standard one, the expansion for the above example will be:
ETH IPV4 UDP VXLAN END
ETH IPV4 UDP VXLAN ETH IPV4 END
ETH IPV4 UDP VXLAN ETH IPV6 END
Otherwise, the expansion will be:
ETH IPV4 UDP VXLAN END
ETH IPV4 UDP VXLAN IPV4 END
ETH IPV4 UDP VXLAN IPV6 END

Fixes: f4f06e361516 ("net/mlx5: add flow VXLAN item")
Cc: sta...@dpdk.org

Signed-off-by: Lior Margalit 
Acked-by: Matan Azrad 

---
 drivers/net/mlx5/mlx5_flow.c | 71 +---
 1 file changed, 66 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index c914a7120c..509ca01859 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -132,6 +132,12 @@ struct mlx5_flow_expand_rss {
 static void
 mlx5_dbg__print_pattern(const struct rte_flow_item *item);
 
+static const struct mlx5_flow_expand_node *
+mlx5_flow_expand_rss_adjust_node(const struct rte_flow_item *pattern,
+   unsigned int item_idx,
+   const struct mlx5_flow_expand_node graph[],
+   const struct mlx5_flow_expand_node *node);
+
 static bool
 mlx5_flow_is_rss_expandable_item(const struct rte_flow_item *item)
 {
@@ -318,7 +324,7 @@ mlx5_flow_expand_rss(struct mlx5_flow_expand_rss *buf, 
size_t size,
const int *stack[MLX5_RSS_EXP_ELT_N];
int stack_pos = 0;
struct rte_flow_item flow_items[MLX5_RSS_EXP_ELT_N];
-   unsigned int i;
+   unsigned int i, item_idx, last_expand_item_idx = 0;
size_t lsize;
size_t user_pattern_size = 0;
void *addr = NULL;
@@ -326,7 +332,7 @@ mlx5_flow_expand_rss(struct mlx5_flow_expand_rss *buf, 
size_t size,
struct rte_flow_item missed_item;
int missed = 0;
int elt = 0;
-   const struct rte_flow_item *last_item = NULL;
+   const struct rte_flow_item *last_expand_item = NULL;
 
memset(&missed_item, 0, sizeof(missed_item));
lsize = offsetof(struct mlx5_flow_expand_rss, entry) +
@@ -337,12 +343,15 @@ mlx5_flow_expand_rss(struct mlx5_flow_expand_rss *buf, 
size_t size,
buf->entry[0].pattern = (void *)&buf->entry[MLX5_RSS_EXP_ELT_N];
buf->entries = 0;
addr = buf->entry[0].pattern;
-   for (item = pattern; item->type != RTE_FLOW_ITEM_TYPE_END; item++) {
+   for (item = pattern, item_idx = 0;
+   item->type != RTE_FLOW_ITEM_TYPE_END;
+   item++, item_idx++) {
if (!mlx5_flow_is_rss_expandable_item(item)) {
user_pattern_size += sizeof(*item);
continue;
}
-   last_item = item;
+   last_expand_item = item;
+   last_expand_item_idx = item_idx;
i = 0;
while (node->next && node->next[i]) {
next = &graph[node->next[i]];
@@ -374,7 +383,7 @@ mlx5_flow_expand_rss(struct mlx5_flow_expand_rss *buf, 
size_t size,
 * Check if the last valid item has spec set, need complete pattern,
 * and the pattern can be used for expansion.
 */
-   missed_item.type = mlx5_flow_expand_rss_item_complete(last_item);
+   missed_item.type = mlx5_flow_expand_rss_item_complete(last_expand_item);
if (missed_item.type == RTE_FLOW_ITEM_TYPE_END) {
/* Item type END indicates expansion is not required. */
return lsize;
@@ -409,6 +418,9 @@ mlx5_flow_expand_rss(struct mlx5_flow_expand_rss *buf, 
size_t size,
addr = (void *)(((uintptr_t)addr) +
elt * sizeof(*item));
}
+   } else if (last_expand_item != NULL) {
+   node = mlx5_flow_expand_rss_adjust_node(pattern,
+   last_expand_item_idx, graph, node);
}
memset(flow_items, 0, sizeof(flow_items));
next_node = mlx5_flow_expand_rss_skip_explicit(graph,
@@ -495,6 +507,8 @@ enum mlx5_expansion {
MLX5_EXPANSION_OUTER_IPV6_UDP,
MLX5_EXPANSION_OUTER_IPV6_TCP,
MLX5_EXPANSION_VXLAN,
+   MLX5_EXPANSION_STD_VXLAN,
+   MLX5_EXPANSION_L3_VXLAN,
MLX5_EXPANSION_VXLAN_GPE,
MLX5_EXPANSION_GRE,
MLX5_EXPANSION_NVGRE,
@@ -59

Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-14 Thread Dmitry Kozlyuk
2021-10-14 09:31 (UTC+), Harman Kalra:
> > -Original Message-
> > From: Thomas Monjalon 
> > Sent: Thursday, October 14, 2021 1:53 PM
> > To: Harman Kalra 
> > Cc: dev@dpdk.org; Raslan Darawsheh ; Ray Kinsella
> > ; Dmitry Kozlyuk ; David
> > Marchand ; viachesl...@nvidia.com;
> > ma...@nvidia.com
> > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement
> > get set APIs
> > 
> > 13/10/2021 20:52, Thomas Monjalon:  
> > > 13/10/2021 19:57, Harman Kalra:  
> > > > From: dev  On Behalf Of Harman Kalra  
> > > > > From: Thomas Monjalon   
> > > > > > 04/10/2021 11:57, David Marchand:  
> > > > > > > On Mon, Oct 4, 2021 at 10:51 AM Harman Kalra
> > > > > > >   
> > > > > > wrote:  
> > > > > > > > > > +struct rte_intr_handle *rte_intr_handle_instance_alloc(int 
> > > > > > > > > >  
> > size,  
> > > > > > > > > > +
> > > > > > > > > > +bool
> > > > > > > > > > +from_hugepage) {
> > > > > > > > > > +   struct rte_intr_handle *intr_handle;
> > > > > > > > > > +   int i;
> > > > > > > > > > +
> > > > > > > > > > +   if (from_hugepage)
> > > > > > > > > > +   intr_handle = rte_zmalloc(NULL,
> > > > > > > > > > + size * 
> > > > > > > > > > sizeof(struct rte_intr_handle),
> > > > > > > > > > + 0);
> > > > > > > > > > +   else
> > > > > > > > > > +   intr_handle = calloc(1, size *
> > > > > > > > > > + sizeof(struct rte_intr_handle));  
> > > > > > > > >
> > > > > > > > > We can call DPDK allocator in all cases.
> > > > > > > > > That would avoid headaches on why multiprocess does not
> > > > > > > > > work in some rarely tested cases.  
> > [...]  
> > > > > > I agree with David.
> > > > > > I prefer a simpler API which always use rte_malloc, and make
> > > > > > sure interrupts are always handled between rte_eal_init and  
> > rte_eal_cleanup.
> > [...]  
> > > > > There are couple of more dependencies on glibc heap APIs:
> > > > > 1. "rte_eal_alarm_init()" allocates an interrupt instance which is
> > > > > used for timerfd, is called before "rte_eal_memory_init()" which
> > > > > does the memseg init.
> > > > > Not sure what all challenges we may face in moving alarm_init
> > > > > after memory_init as it might break some subsystem inits.
> > > > > Other option could be to allocate interrupt instance for timerfd
> > > > > on first alarm_setup call.  
> > >
> > > Indeed it is an issue.
> > >
> > > [...]
> > >  
> > > > > There are many other drivers which statically declares the
> > > > > interrupt handles inside their respective private structures and
> > > > > memory for those structure was allocated from heap. For such
> > > > > drivers I allocated interrupt instances also using glibc heap APIs.  
> > >
> > > Could you use rte_malloc in these drivers?  
> > 
> > If we take the direction of 2 different allocations mode for the 
> > interrupts, I
> > suggest we make it automatic without any API parameter.
> > We don't have any function to check rte_malloc readiness I think.
> > But we can detect whether shared memory is ready with this check:
> > rte_eal_get_configuration()->mem_config->magic == RTE_MAGIC This check
> > is true at the end of rte_eal_init, so it is false during probing.
> > Would it be enough? Or should we implement rte_malloc_is_ready()?  
> 
> Hi Thomas,
> 
> It's a very good suggestion. Let's implement "rte_malloc_is_ready()" which 
> could be as
> simple as " rte_eal_get_configuration()->mem_config->magic == RTE_MAGIC" 
> check.
> There may be more consumers for this API in future.

I doubt it should be public. How it is supposed to be used? 
Any application code for DPDK necessarily calls rte_eal_init() first,
after that this function would always return true.

> 
> If we are making it automatic detection, shall we now even have argument to 
> this alloc API?
> I added a flags argument (32 bit) in latest series where each bit of this 
> flag can be an allocation capability.
> I used two bits for discriminating between glibc malloc and rte_malloc. Shall 
> we keep it or drop it?
> 
> David, Dmitry please share your thoughts.

I'd drop it, but no strong opinion.
Since allocation type is automatic and all other properties can be set later,
there are no use cases for any options here.
And if they appear, flags may be insufficient as well.


Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-14 Thread Harman Kalra



> -Original Message-
> From: Thomas Monjalon 
> Sent: Thursday, October 14, 2021 3:11 PM
> To: Harman Kalra 
> Cc: David Marchand ; dev@dpdk.org; Raslan
> Darawsheh ; Ray Kinsella ; Dmitry
> Kozlyuk ; viachesl...@nvidia.com;
> ma...@nvidia.com
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement
> get set APIs
> 
> 14/10/2021 11:31, Harman Kalra:
> > From: Thomas Monjalon 
> > > 13/10/2021 20:52, Thomas Monjalon:
> > > > 13/10/2021 19:57, Harman Kalra:
> > > > > From: dev  On Behalf Of Harman Kalra
> > > > > > From: Thomas Monjalon 
> > > > > > > 04/10/2021 11:57, David Marchand:
> > > > > > > > On Mon, Oct 4, 2021 at 10:51 AM Harman Kalra
> > > > > > > > 
> > > > > > > wrote:
> > > > > > > > > > > +struct rte_intr_handle
> > > > > > > > > > > +*rte_intr_handle_instance_alloc(int
> > > size,
> > > > > > > > > > > +
> > > > > > > > > > > +bool
> > > > > > > > > > > +from_hugepage) {
> > > > > > > > > > > +   struct rte_intr_handle *intr_handle;
> > > > > > > > > > > +   int i;
> > > > > > > > > > > +
> > > > > > > > > > > +   if (from_hugepage)
> > > > > > > > > > > +   intr_handle = rte_zmalloc(NULL,
> > > > > > > > > > > + size * 
> > > > > > > > > > > sizeof(struct rte_intr_handle),
> > > > > > > > > > > + 0);
> > > > > > > > > > > +   else
> > > > > > > > > > > +   intr_handle = calloc(1, size *
> > > > > > > > > > > + sizeof(struct rte_intr_handle));
> > > > > > > > > >
> > > > > > > > > > We can call DPDK allocator in all cases.
> > > > > > > > > > That would avoid headaches on why multiprocess does
> > > > > > > > > > not work in some rarely tested cases.
> > > [...]
> > > > > > > I agree with David.
> > > > > > > I prefer a simpler API which always use rte_malloc, and make
> > > > > > > sure interrupts are always handled between rte_eal_init and
> > > rte_eal_cleanup.
> > > [...]
> > > > > > There are couple of more dependencies on glibc heap APIs:
> > > > > > 1. "rte_eal_alarm_init()" allocates an interrupt instance
> > > > > > which is used for timerfd, is called before
> > > > > > "rte_eal_memory_init()" which does the memseg init.
> > > > > > Not sure what all challenges we may face in moving alarm_init
> > > > > > after memory_init as it might break some subsystem inits.
> > > > > > Other option could be to allocate interrupt instance for
> > > > > > timerfd on first alarm_setup call.
> > > >
> > > > Indeed it is an issue.
> > > >
> > > > [...]
> > > >
> > > > > > There are many other drivers which statically declares the
> > > > > > interrupt handles inside their respective private structures
> > > > > > and memory for those structure was allocated from heap. For
> > > > > > such drivers I allocated interrupt instances also using glibc heap
> APIs.
> > > >
> > > > Could you use rte_malloc in these drivers?
> > >
> > > If we take the direction of 2 different allocations mode for the
> > > interrupts, I suggest we make it automatic without any API parameter.
> > > We don't have any function to check rte_malloc readiness I think.
> > > But we can detect whether shared memory is ready with this check:
> > > rte_eal_get_configuration()->mem_config->magic == RTE_MAGIC This
> > > check is true at the end of rte_eal_init, so it is false during probing.
> > > Would it be enough? Or should we implement rte_malloc_is_ready()?
> >
> > Hi Thomas,
> >
> > It's a very good suggestion. Let's implement "rte_malloc_is_ready()"
> > which could be as simple as " rte_eal_get_configuration()->mem_config-
> >magic == RTE_MAGIC" check.
> > There may be more consumers for this API in future.
> 
> You cannot rely on the magic because it is set only after probing.
> For such API you need to have another internal flag to check that malloc is
> setup.

Yeah, got that. You mean in case of bus probing although rte_malloc is setup
but eal_mcfg_complete() is calledt done yet. So we should set another malloc
specific flag at the end of rte_eal_memory_init(). Correct?

But just for understanding, as David suggested that we preserve keep this flag
then why not use it, have rte_malloc and malloc bits  and make a decision.
Let driver has the flexibility to choose. Do you see any harm in this?

Thanks
Harman


> 



[dpdk-dev] [PATCH v13 0/2] testpmd shows incorrect rx_offload configuration

2021-10-14 Thread Jie Wang
Launch testpmd with multiple queues, and check rx_offload info.

When testpmd shows the port configuration, it doesn't show RSS_HASH.

---
v13:
 - update the API comment.
 - fix the bug that testpmd failed to run test_pf_tx_rx_queue test case.
v12: update the commit log and the API comment.
v11:
 - update the commit log.
 - rename the function and variable name.
v10:
 - update the commit log.
 - merge the first two patches.
 - rename the new API name.
v9:
 - add a release notes update for the new API.
 - update the description of the new API.
 - optimize the new API.
 - optimize the assignment of the offloads.
v8: delete "rte_exit" and just print error log.
v7:
 - delete struct "rte_eth_dev_conf_info", and reuse struct "rte_eth_conf".
 - add "__rte_experimental" to the new API "rte_eth_dev_conf_info_get" 
declaration.
v6: split this patch into two patches.
v5: add an API to get device configuration info.
v4: delete the whitespace at the end of the line.
v3:
 - check and update the "offloads" of "port->dev_conf.rx/txmode".
 - update the commit log.
v2: copy "rx/txmode.offloads", instead of copying the entire struct
"dev->data->dev_conf.rx/txmode".

Jie Wang (2):
  ethdev: add an API to get device configuration
  app/testpmd: fix testpmd doesn't show RSS hash offload

 app/test-pmd/cmdline.c | 14 ++--
 app/test-pmd/testpmd.c | 48 --
 app/test-pmd/testpmd.h |  2 ++
 app/test-pmd/util.c| 14 
 doc/guides/rel_notes/release_21_11.rst |  4 +++
 lib/ethdev/rte_ethdev.c| 20 +++
 lib/ethdev/rte_ethdev.h| 18 ++
 lib/ethdev/version.map |  1 +
 8 files changed, 116 insertions(+), 5 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v13 1/2] ethdev: add an API to get device configuration

2021-10-14 Thread Jie Wang
The driver may change offloads info into dev->data->dev_conf
in dev_configure which may cause apps use outdated values.

Add a new API to get actual device configuration.

Acked-by: Andrew Rybchenko 
Signed-off-by: Jie Wang 
---
 doc/guides/rel_notes/release_21_11.rst |  4 
 lib/ethdev/rte_ethdev.c| 20 
 lib/ethdev/rte_ethdev.h| 18 ++
 lib/ethdev/version.map |  1 +
 4 files changed, 43 insertions(+)

diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index d5c762df62..5292149981 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -81,6 +81,10 @@ New Features
   * Default VLAN strip behavior was changed. VLAN tag won't be stripped
 unless ``DEV_RX_OFFLOAD_VLAN_STRIP`` offload is enabled.
 
+* **Added support for users get device configuration in ethdev.**
+
+  Added an ethdev API which can help users get device configuration.
+
 * **Updated AF_XDP PMD.**
 
   * Disabled secondary process support.
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 5fae7357c8..063e4925f8 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -3437,6 +3437,26 @@ rte_eth_dev_info_get(uint16_t port_id, struct 
rte_eth_dev_info *dev_info)
return 0;
 }
 
+int
+rte_eth_dev_conf_get(uint16_t port_id, struct rte_eth_conf *dev_conf)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = &rte_eth_devices[port_id];
+
+   if (dev_conf == NULL) {
+   RTE_ETHDEV_LOG(ERR,
+   "Cannot get ethdev port %u configuration to NULL\n",
+   port_id);
+   return -EINVAL;
+   }
+
+   memcpy(dev_conf, &dev->data->dev_conf, sizeof(struct rte_eth_conf));
+
+   return 0;
+}
+
 int
 rte_eth_dev_get_supported_ptypes(uint16_t port_id, uint32_t ptype_mask,
 uint32_t *ptypes, int num)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index cb847a2c38..58d10e5699 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -3052,6 +3052,24 @@ int rte_eth_macaddr_get(uint16_t port_id, struct 
rte_ether_addr *mac_addr);
  */
 int rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Retrieve the configuration of an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param dev_conf
+ *   Location for Ethernet device configuration to be filled in.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_dev_conf_get(uint16_t port_id, struct rte_eth_conf *dev_conf);
+
 /**
  * Retrieve the firmware version of a device.
  *
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 29fb71f1af..4debda513b 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -247,6 +247,7 @@ EXPERIMENTAL {
rte_mtr_meter_policy_validate;
 
# added in 21.11
+   rte_eth_dev_conf_get;
rte_eth_rx_metadata_negotiate;
 };
 
-- 
2.25.1



[dpdk-dev] [PATCH v13 2/2] app/testpmd: fix testpmd doesn't show RSS hash offload

2021-10-14 Thread Jie Wang
The driver may change offloads info into dev->data->dev_conf
in dev_configure which may cause port->dev_conf and port->rx_conf
contain outdated values.

This patch updates the offloads info if it changes to fix this issue.

Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")

Signed-off-by: Jie Wang 
---
 app/test-pmd/cmdline.c | 14 ++--
 app/test-pmd/testpmd.c | 48 +++---
 app/test-pmd/testpmd.h |  2 ++
 app/test-pmd/util.c| 14 
 4 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 36d50fd3c7..b8f06063d2 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -16034,6 +16034,7 @@ cmd_rx_offload_get_configuration_parsed(
struct rte_eth_dev_info dev_info;
portid_t port_id = res->port_id;
struct rte_port *port = &ports[port_id];
+   struct rte_eth_conf dev_conf;
uint64_t port_offloads;
uint64_t queue_offloads;
uint16_t nb_rx_queues;
@@ -16042,7 +16043,11 @@ cmd_rx_offload_get_configuration_parsed(
 
printf("Rx Offloading Configuration of port %d :\n", port_id);
 
-   port_offloads = port->dev_conf.rxmode.offloads;
+   ret = eth_dev_conf_get_print_err(port_id, &dev_conf);
+   if (ret != 0)
+   return;
+
+   port_offloads = dev_conf.rxmode.offloads;
printf("  Port :");
print_rx_offloads(port_offloads);
printf("\n");
@@ -16448,6 +16453,7 @@ cmd_tx_offload_get_configuration_parsed(
struct rte_eth_dev_info dev_info;
portid_t port_id = res->port_id;
struct rte_port *port = &ports[port_id];
+   struct rte_eth_conf dev_conf;
uint64_t port_offloads;
uint64_t queue_offloads;
uint16_t nb_tx_queues;
@@ -16456,7 +16462,11 @@ cmd_tx_offload_get_configuration_parsed(
 
printf("Tx Offloading Configuration of port %d :\n", port_id);
 
-   port_offloads = port->dev_conf.txmode.offloads;
+   ret = eth_dev_conf_get_print_err(port_id, &dev_conf);
+   if (ret != 0)
+   return;
+
+   port_offloads = dev_conf.txmode.offloads;
printf("  Port :");
print_tx_offloads(port_offloads);
printf("\n");
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a7841c557f..6cb00882bb 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2582,6 +2582,9 @@ start_port(portid_t pid)
}
 
if (port->need_reconfig > 0) {
+   struct rte_eth_conf dev_conf;
+   int k;
+
port->need_reconfig = 0;
 
if (flow_isolate_all) {
@@ -2619,6 +2622,36 @@ start_port(portid_t pid)
port->need_reconfig = 1;
return -1;
}
+   /* get device configuration*/
+   if (0 !=
+   eth_dev_conf_get_print_err(pi, &dev_conf)) {
+   fprintf(stderr,
+   "port %d can not get device 
configuration\n",
+   pi);
+   return -1;
+   }
+   /* Apply Rx offloads configuration */
+   if (dev_conf.rxmode.offloads !=
+   port->dev_conf.rxmode.offloads) {
+   port->dev_conf.rxmode.offloads |=
+   dev_conf.rxmode.offloads;
+   for (k = 0;
+k < port->dev_info.max_rx_queues;
+k++)
+   port->rx_conf[k].offloads |=
+   dev_conf.rxmode.offloads;
+   }
+   /* Apply Tx offloads configuration */
+   if (dev_conf.txmode.offloads !=
+   port->dev_conf.txmode.offloads) {
+   port->dev_conf.txmode.offloads |=
+   dev_conf.txmode.offloads;
+   for (k = 0;
+k < port->dev_info.max_tx_queues;
+k++)
+   port->tx_conf[k].offloads |=
+   dev_conf.txmode.offloads;
+   }
}
if (port->need_reconfig_queues > 0 && is_proc_primary()) {
port->need_reconfig_queues = 0;
@@ -3581,7 +3614,7 @@ init_port_config(void)
 {
portid_t pid;
struct rte_port *port;
-   int ret;
+   int ret, i;
 
RTE_ETH_FOREACH_DEV(pid) {
port = &ports[pid];
@@ -3601,12 +3634,21 @@ init_port_config(vo

Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-14 Thread Thomas Monjalon
14/10/2021 12:31, Harman Kalra:
> From: Thomas Monjalon 
> > 14/10/2021 11:31, Harman Kalra:
> > > From: Thomas Monjalon 
> > > > 13/10/2021 20:52, Thomas Monjalon:
> > > > > 13/10/2021 19:57, Harman Kalra:
> > > > > > From: dev  On Behalf Of Harman Kalra
> > > > > > > From: Thomas Monjalon 
> > > > > > > > 04/10/2021 11:57, David Marchand:
> > > > > > > > > On Mon, Oct 4, 2021 at 10:51 AM Harman Kalra
> > > > > > > > > 
> > > > > > > > wrote:
> > > > > > > > > > > > +struct rte_intr_handle
> > > > > > > > > > > > +*rte_intr_handle_instance_alloc(int
> > > > size,
> > > > > > > > > > > > +
> > > > > > > > > > > > +bool
> > > > > > > > > > > > +from_hugepage) {
> > > > > > > > > > > > +   struct rte_intr_handle *intr_handle;
> > > > > > > > > > > > +   int i;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   if (from_hugepage)
> > > > > > > > > > > > +   intr_handle = rte_zmalloc(NULL,
> > > > > > > > > > > > + size * 
> > > > > > > > > > > > sizeof(struct rte_intr_handle),
> > > > > > > > > > > > + 0);
> > > > > > > > > > > > +   else
> > > > > > > > > > > > +   intr_handle = calloc(1, size *
> > > > > > > > > > > > + sizeof(struct rte_intr_handle));
> > > > > > > > > > >
> > > > > > > > > > > We can call DPDK allocator in all cases.
> > > > > > > > > > > That would avoid headaches on why multiprocess does
> > > > > > > > > > > not work in some rarely tested cases.
> > > > [...]
> > > > > > > > I agree with David.
> > > > > > > > I prefer a simpler API which always use rte_malloc, and make
> > > > > > > > sure interrupts are always handled between rte_eal_init and
> > > > rte_eal_cleanup.
> > > > [...]
> > > > > > > There are couple of more dependencies on glibc heap APIs:
> > > > > > > 1. "rte_eal_alarm_init()" allocates an interrupt instance
> > > > > > > which is used for timerfd, is called before
> > > > > > > "rte_eal_memory_init()" which does the memseg init.
> > > > > > > Not sure what all challenges we may face in moving alarm_init
> > > > > > > after memory_init as it might break some subsystem inits.
> > > > > > > Other option could be to allocate interrupt instance for
> > > > > > > timerfd on first alarm_setup call.
> > > > >
> > > > > Indeed it is an issue.
> > > > >
> > > > > [...]
> > > > >
> > > > > > > There are many other drivers which statically declares the
> > > > > > > interrupt handles inside their respective private structures
> > > > > > > and memory for those structure was allocated from heap. For
> > > > > > > such drivers I allocated interrupt instances also using glibc heap
> > APIs.
> > > > >
> > > > > Could you use rte_malloc in these drivers?
> > > >
> > > > If we take the direction of 2 different allocations mode for the
> > > > interrupts, I suggest we make it automatic without any API parameter.
> > > > We don't have any function to check rte_malloc readiness I think.
> > > > But we can detect whether shared memory is ready with this check:
> > > > rte_eal_get_configuration()->mem_config->magic == RTE_MAGIC This
> > > > check is true at the end of rte_eal_init, so it is false during probing.
> > > > Would it be enough? Or should we implement rte_malloc_is_ready()?
> > >
> > > Hi Thomas,
> > >
> > > It's a very good suggestion. Let's implement "rte_malloc_is_ready()"
> > > which could be as simple as " rte_eal_get_configuration()->mem_config-
> > >magic == RTE_MAGIC" check.
> > > There may be more consumers for this API in future.
> > 
> > You cannot rely on the magic because it is set only after probing.
> > For such API you need to have another internal flag to check that malloc is
> > setup.
> 
> Yeah, got that. You mean in case of bus probing although rte_malloc is setup
> but eal_mcfg_complete() is calledt done yet. So we should set another malloc
> specific flag at the end of rte_eal_memory_init(). Correct?

I think the new internal flag should be at the end of 
rte_eal_malloc_heap_init().
Then a rte_internal function rte_malloc_is_ready() should check this flag.

> But just for understanding, as David suggested that we preserve keep this flag
> then why not use it, have rte_malloc and malloc bits  and make a decision.
> Let driver has the flexibility to choose. Do you see any harm in this?

Which flag?





Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-14 Thread Harman Kalra



> -Original Message-
> From: Thomas Monjalon 
> Sent: Thursday, October 14, 2021 4:06 PM
> To: Harman Kalra 
> Cc: David Marchand ; dev@dpdk.org; Raslan
> Darawsheh ; Ray Kinsella ; Dmitry
> Kozlyuk ; viachesl...@nvidia.com;
> ma...@nvidia.com
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement
> get set APIs
> 
> 14/10/2021 12:31, Harman Kalra:
> > From: Thomas Monjalon 
> > > 14/10/2021 11:31, Harman Kalra:
> > > > From: Thomas Monjalon 
> > > > > 13/10/2021 20:52, Thomas Monjalon:
> > > > > > 13/10/2021 19:57, Harman Kalra:
> > > > > > > From: dev  On Behalf Of Harman Kalra
> > > > > > > > From: Thomas Monjalon 
> > > > > > > > > 04/10/2021 11:57, David Marchand:
> > > > > > > > > > On Mon, Oct 4, 2021 at 10:51 AM Harman Kalra
> > > > > > > > > > 
> > > > > > > > > wrote:
> > > > > > > > > > > > > +struct rte_intr_handle
> > > > > > > > > > > > > +*rte_intr_handle_instance_alloc(int
> > > > > size,
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +bool
> > > > > > > > > > > > > +from_hugepage) {
> > > > > > > > > > > > > +   struct rte_intr_handle *intr_handle;
> > > > > > > > > > > > > +   int i;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +   if (from_hugepage)
> > > > > > > > > > > > > +   intr_handle = rte_zmalloc(NULL,
> > > > > > > > > > > > > + size * 
> > > > > > > > > > > > > sizeof(struct
> rte_intr_handle),
> > > > > > > > > > > > > + 0);
> > > > > > > > > > > > > +   else
> > > > > > > > > > > > > +   intr_handle = calloc(1, size *
> > > > > > > > > > > > > + sizeof(struct rte_intr_handle));
> > > > > > > > > > > >
> > > > > > > > > > > > We can call DPDK allocator in all cases.
> > > > > > > > > > > > That would avoid headaches on why multiprocess
> > > > > > > > > > > > does not work in some rarely tested cases.
> > > > > [...]
> > > > > > > > > I agree with David.
> > > > > > > > > I prefer a simpler API which always use rte_malloc, and
> > > > > > > > > make sure interrupts are always handled between
> > > > > > > > > rte_eal_init and
> > > > > rte_eal_cleanup.
> > > > > [...]
> > > > > > > > There are couple of more dependencies on glibc heap APIs:
> > > > > > > > 1. "rte_eal_alarm_init()" allocates an interrupt instance
> > > > > > > > which is used for timerfd, is called before
> > > > > > > > "rte_eal_memory_init()" which does the memseg init.
> > > > > > > > Not sure what all challenges we may face in moving
> > > > > > > > alarm_init after memory_init as it might break some subsystem
> inits.
> > > > > > > > Other option could be to allocate interrupt instance for
> > > > > > > > timerfd on first alarm_setup call.
> > > > > >
> > > > > > Indeed it is an issue.
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > > There are many other drivers which statically declares the
> > > > > > > > interrupt handles inside their respective private
> > > > > > > > structures and memory for those structure was allocated
> > > > > > > > from heap. For such drivers I allocated interrupt
> > > > > > > > instances also using glibc heap
> > > APIs.
> > > > > >
> > > > > > Could you use rte_malloc in these drivers?
> > > > >
> > > > > If we take the direction of 2 different allocations mode for the
> > > > > interrupts, I suggest we make it automatic without any API parameter.
> > > > > We don't have any function to check rte_malloc readiness I think.
> > > > > But we can detect whether shared memory is ready with this check:
> > > > > rte_eal_get_configuration()->mem_config->magic == RTE_MAGIC This
> > > > > check is true at the end of rte_eal_init, so it is false during 
> > > > > probing.
> > > > > Would it be enough? Or should we implement rte_malloc_is_ready()?
> > > >
> > > > Hi Thomas,
> > > >
> > > > It's a very good suggestion. Let's implement "rte_malloc_is_ready()"
> > > > which could be as simple as "
> > > >rte_eal_get_configuration()->mem_config-
> > > >magic == RTE_MAGIC" check.
> > > > There may be more consumers for this API in future.
> > >
> > > You cannot rely on the magic because it is set only after probing.
> > > For such API you need to have another internal flag to check that
> > > malloc is setup.
> >
> > Yeah, got that. You mean in case of bus probing although rte_malloc is
> > setup but eal_mcfg_complete() is calledt done yet. So we should set
> > another malloc specific flag at the end of rte_eal_memory_init(). Correct?
> 
> I think the new internal flag should be at the end of
> rte_eal_malloc_heap_init().
> Then a rte_internal function rte_malloc_is_ready() should check this flag.

Sure.

> 
> > But just for understanding, as David suggested that we preserve keep
> > this flag then why not use it, have rte_malloc and malloc bits  and make a
> decision.
> > Let driver has the flexibility to choose. Do you see any harm in this?
> 
> Which flag?

In V2, I have replaced the bool arg with an 32bit flag i

[dpdk-dev] [PATCH v9 0/4] improve telemetry support with in-memory mode

2021-10-14 Thread Bruce Richardson
This patchset cleans up telemetry support for "in-memory" mode, so that
multiple independent processes can be run using that mode and still have
telemetry support. It also removes problems of one process removing the
socket of another - which was the original issue reported. The main changes
in this set are to:

* disable telemetry for secondary processes, which prevents any socket
  conflicts in multi-process cases.
* when multiple processes are run using the same runtime directory (i.e.
  "in-memory" mode or similar), add a counter suffix to the socket names to
  avoid conflicts over the socket. Each process will use the lowest available
  suffix, with the first process using the directory, not adding any suffix.
* update the telemetry script and documentation to allow it to connect to
  in-memory DPDK processes.

---
V9: sort output lines in help text in script

V8: Merged patches 2 & 3 of the set together. Fixed some checkpatch warnings
flagged by the CI.

V7: Change from adding a pid suffix generally in "in-memory" mode, to adding an
increasing counter as a suffix in case of name conflicts generally. This
achieves the same result in terms of connectivity, but keeps compatibility of
behaviour for the case of a single in-memory process, while also providing
predictable more socket names for each process i.e. 4 running in-memory
instances they will always use suffixes 1-3 for the extra 3 sockets, even across
restarts.

V6: fixed issue whereby the failing of the legacy telemetry init would roll-back
init of the v2 telemetry, causing the socket to be deleted, even though it was
still necessary.

V5: Rebase on latest main after other script cleanups were merged

V4: Move from simple-fix patch to proper fix patchset

V3: Drop CC stable, as will have separate backport patch which does not
error out, so avoiding causing problems with currently running application

V2: fix build error on FreeBSD

Bruce Richardson (4):
  eal: limit telemetry to primary processes
  telemetry: fix socket path conflicts for in-memory mode
  usertools/dpdk-telemetry: connect to separate instances
  usertools/dpdk-telemetry: provide info on available sockets

 doc/guides/howto/telemetry.rst | 41 +
 lib/eal/freebsd/eal.c  |  2 +-
 lib/eal/linux/eal.c|  2 +-
 lib/telemetry/telemetry.c  | 65 +-
 usertools/dpdk-telemetry.py| 45 ---
 5 files changed, 133 insertions(+), 22 deletions(-)

--
2.30.2



[dpdk-dev] [PATCH v9 1/4] eal: limit telemetry to primary processes

2021-10-14 Thread Bruce Richardson
Telemetry interface should be exposed for primary processes only, since
secondary processes will conflict on socket creation, and since all
data in secondary process is generally available to primary. For
example, all device stats for ethdevs, cryptodevs, etc. will all be
common across processes.

Signed-off-by: Bruce Richardson 
Acked-by: Ciara Power 
Tested-by: Conor Walsh 
---
 lib/eal/freebsd/eal.c | 2 +-
 lib/eal/linux/eal.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index fb734012a4..56a60f13e9 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -950,7 +950,7 @@ rte_eal_init(int argc, char **argv)
rte_eal_init_alert("Cannot clear runtime directory");
return -1;
}
-   if (!internal_conf->no_telemetry) {
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY && 
!internal_conf->no_telemetry) {
int tlog = rte_log_register_type_and_pick_level(
"lib.telemetry", RTE_LOG_WARNING);
if (tlog < 0)
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 3577eaeaa4..0d0fc8 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1320,7 +1320,7 @@ rte_eal_init(int argc, char **argv)
rte_eal_init_alert("Cannot clear runtime directory");
return -1;
}
-   if (!internal_conf->no_telemetry) {
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY && 
!internal_conf->no_telemetry) {
int tlog = rte_log_register_type_and_pick_level(
"lib.telemetry", RTE_LOG_WARNING);
if (tlog < 0)
-- 
2.30.2



[dpdk-dev] [PATCH v9 2/4] telemetry: fix socket path conflicts for in-memory mode

2021-10-14 Thread Bruce Richardson
When running using in-memory mode, multiple processes can use the same
runtime dir, leading to conflicts with the telemetry sockets in that
directory. We can resolve this by appending a suffix to each socket
beyond the first, with the suffix being an increasing counter value.
Each process uses the first unused socket counter value.

Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")

Reported-by: David Marchand 
Signed-off-by: Bruce Richardson 
Acked-by: Ciara Power 
Acked-by: Kevin Traynor 
Tested-by: Conor Walsh 
---
 lib/telemetry/telemetry.c | 65 +--
 1 file changed, 49 insertions(+), 16 deletions(-)

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index 48f4c7ba46..a7483167d4 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -457,28 +457,45 @@ create_socket(char *path)
 
struct sockaddr_un sun = {.sun_family = AF_UNIX};
strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
-   unlink(sun.sun_path);
+   TMTY_LOG(DEBUG, "Attempting socket bind to path '%s'\n", path);
+
if (bind(sock, (void *) &sun, sizeof(sun)) < 0) {
struct stat st;
 
-   TMTY_LOG(ERR, "Error binding socket: %s\n", strerror(errno));
-   if (stat(socket_dir, &st) < 0 || !S_ISDIR(st.st_mode))
+   TMTY_LOG(DEBUG, "Initial bind to socket '%s' failed.\n", path);
+
+   /* first check if we have a runtime dir */
+   if (stat(socket_dir, &st) < 0 || !S_ISDIR(st.st_mode)) {
TMTY_LOG(ERR, "Cannot access DPDK runtime directory: 
%s\n", socket_dir);
-   sun.sun_path[0] = 0;
-   goto error;
+   close(sock);
+   return -ENOENT;
+   }
+
+   /* check if current socket is active */
+   if (connect(sock, (void *)&sun, sizeof(sun)) == 0) {
+   close(sock);
+   return -EADDRINUSE;
+   }
+
+   /* socket is not active, delete and attempt rebind */
+   TMTY_LOG(DEBUG, "Attempting unlink and retrying bind\n");
+   unlink(sun.sun_path);
+   if (bind(sock, (void *) &sun, sizeof(sun)) < 0) {
+   TMTY_LOG(ERR, "Error binding socket: %s\n", 
strerror(errno));
+   close(sock);
+   return -errno; /* if unlink failed, this will be 
-EADDRINUSE as above */
+   }
}
 
if (listen(sock, 1) < 0) {
TMTY_LOG(ERR, "Error calling listen for socket: %s\n", 
strerror(errno));
-   goto error;
+   unlink(sun.sun_path);
+   close(sock);
+   return -errno;
}
+   TMTY_LOG(DEBUG, "Socket creation and binding ok\n");
 
return sock;
-
-error:
-   close(sock);
-   unlink_sockets();
-   return -1;
 }
 
 static void
@@ -511,8 +528,10 @@ telemetry_legacy_init(void)
return -1;
}
v1_socket.sock = create_socket(v1_socket.path);
-   if (v1_socket.sock < 0)
+   if (v1_socket.sock < 0) {
+   v1_socket.path[0] = '\0';
return -1;
+   }
rc = pthread_create(&t_old, NULL, socket_listener, &v1_socket);
if (rc != 0) {
TMTY_LOG(ERR, "Error with create legcay socket thread: %s\n",
@@ -533,7 +552,9 @@ telemetry_legacy_init(void)
 static int
 telemetry_v2_init(void)
 {
+   char spath[sizeof(v2_socket.path)];
pthread_t t_new;
+   short suffix = 0;
int rc;
 
v2_socket.num_clients = &v2_clients;
@@ -544,15 +565,27 @@ telemetry_v2_init(void)
rte_telemetry_register_cmd("/help", command_help,
"Returns help text for a command. Parameters: string 
command");
v2_socket.fn = client_handler;
-   if (strlcpy(v2_socket.path, get_socket_path(socket_dir, 2),
-   sizeof(v2_socket.path)) >= sizeof(v2_socket.path)) {
+   if (strlcpy(spath, get_socket_path(socket_dir, 2), sizeof(spath)) >= 
sizeof(spath)) {
TMTY_LOG(ERR, "Error with socket binding, path too long\n");
return -1;
}
+   memcpy(v2_socket.path, spath, sizeof(v2_socket.path));
 
v2_socket.sock = create_socket(v2_socket.path);
-   if (v2_socket.sock < 0)
-   return -1;
+   while (v2_socket.sock < 0) {
+   /* bail out on unexpected error, or suffix wrap-around */
+   if (v2_socket.sock != -EADDRINUSE || suffix < 0) {
+   v2_socket.path[0] = '\0'; /* clear socket path */
+   return -1;
+   }
+   /* add a suffix to the path if the basic version fails */
+   if (snprintf(v2_socket.path, sizeof(v2_socket.path), "%s:%d",
+   spath, ++suffix) >= 
(int)sizeof(v2_socket.path)) {
+  

[dpdk-dev] [PATCH v9 3/4] usertools/dpdk-telemetry: connect to separate instances

2021-10-14 Thread Bruce Richardson
For processes run using "in-memory" mode sharing the same runtime dir,
we add support for connecting to the separate instance sockets created
using ":1", ":2" etc. via new "-i" or "--instance" argument. Add details
on connecting to separate instances to the telemetry howto document.

Signed-off-by: Bruce Richardson 
Acked-by: Ciara Power 
Tested-by: Conor Walsh 
---
 doc/guides/howto/telemetry.rst | 41 ++
 usertools/dpdk-telemetry.py|  7 +-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/doc/guides/howto/telemetry.rst b/doc/guides/howto/telemetry.rst
index 8f4fa1a510..e4edb53fa4 100644
--- a/doc/guides/howto/telemetry.rst
+++ b/doc/guides/howto/telemetry.rst
@@ -87,3 +87,44 @@ and query information using the telemetry client python 
script.
--> /help,/ethdev/xstats
{"/help": {"/ethdev/xstats": "Returns the extended stats for a port.
Parameters: int port_id"}}
+
+
+Connecting to Different DPDK Processes
+--
+
+When multiple DPDK process instances are running on a system, the user will
+naturally wish to be able to select the instance to which the connection is
+being made. The method to select the instance depends on how the individual
+instances are run:
+
+* For DPDK processes run using a non-default file-prefix,
+  i.e. using the `--file-prefix` EAL option flag,
+  the file-prefix for the process should be passed via the `-f` or 
`--file-prefix` script flag.
+
+  For example, to connect to testpmd run as::
+
+ $ ./build/app/dpdk-testpmd -l 2,3 --file-prefix="tpmd"
+
+  One would use the telemetry script command::
+
+ $ ./usertools/dpdk-telemetry -f "tpmd"
+
+* For the case where multiple processes are run using the `--in-memory` EAL 
flag,
+  but no `-file-prefix` flag, or the same `-file-prefix` flag,
+  those processes will all share the same runtime directory.
+  In this case,
+  each process after the first will add an increasing count suffix to the 
telemetry socket name,
+  with each one taking the first available free socket name.
+  This suffix count can be passed to the telemetry script using the `-i` or 
`--instance` flag.
+
+  For example, if the following two applications are run in separate 
terminals::
+
+ $ ./build/app/dpdk-testpmd -l 2,3 --in-memory# will use socket 
"dpdk_telemetry.v2"
+
+ $ ./build/app/test/dpdk-test -l 4,5 --in-memory  # will use 
"dpdk_telemetry.v2:1"
+
+  The following telemetry script commands would allow one to connect to each 
binary::
+
+ $ ./usertools/dpdk-telemetry.py   # will connect to testpmd
+
+ $ ./usertools/dpdk-telemetry.py -i 1  # will connect to test binary
diff --git a/usertools/dpdk-telemetry.py b/usertools/dpdk-telemetry.py
index 2974a64732..ce27548c3e 100755
--- a/usertools/dpdk-telemetry.py
+++ b/usertools/dpdk-telemetry.py
@@ -112,6 +112,11 @@ def get_dpdk_runtime_dir(fp):
 parser = argparse.ArgumentParser()
 parser.add_argument('-f', '--file-prefix', default='rte',
 help='Provide file-prefix for DPDK runtime directory')
+parser.add_argument('-i', '--instance', default='0', type=int,
+help='Provide file-prefix for DPDK runtime directory')
 args = parser.parse_args()
 rd = get_dpdk_runtime_dir(args.file_prefix)
-handle_socket(os.path.join(rd, 'dpdk_telemetry.{}'.format(TELEMETRY_VERSION)))
+sock_path = os.path.join(rd, 'dpdk_telemetry.{}'.format(TELEMETRY_VERSION))
+if args.instance > 0:
+sock_path += ":{}".format(args.instance)
+handle_socket(sock_path)
-- 
2.30.2



[dpdk-dev] [PATCH v9 4/4] usertools/dpdk-telemetry: provide info on available sockets

2021-10-14 Thread Bruce Richardson
When a user runs the dpdk-telemetry script and fails to connect because
the socket path does not exist, run a scan for possible sockets that
could be connected to and inform the user of the command needed to
connect to those.

For example:

  $ ./dpdk-telemetry.py -i4
  Connecting to /run/user/1000/dpdk/rte/dpdk_telemetry.v2:4
  Error connecting to /run/user/1000/dpdk/rte/dpdk_telemetry.v2:4

  Other DPDK telemetry sockets found:
  - dpdk_telemetry.v2  # Connect with './dpdk-telemetry.py'
  - dpdk_telemetry.v2:2  # Connect with './dpdk-telemetry.py -i 2'
  - dpdk_telemetry.v2:1  # Connect with './dpdk-telemetry.py -i 1'

Signed-off-by: Bruce Richardson 
Acked-by: Ciara Power 
Reviewed-by: Conor Walsh 
---
 usertools/dpdk-telemetry.py | 42 -
 1 file changed, 37 insertions(+), 5 deletions(-)

diff --git a/usertools/dpdk-telemetry.py b/usertools/dpdk-telemetry.py
index ce27548c3e..8f7d59d139 100755
--- a/usertools/dpdk-telemetry.py
+++ b/usertools/dpdk-telemetry.py
@@ -10,6 +10,7 @@
 import socket
 import os
 import sys
+import glob
 import json
 import errno
 import readline
@@ -17,6 +18,8 @@
 
 # global vars
 TELEMETRY_VERSION = "v2"
+SOCKET_NAME = 'dpdk_telemetry.{}'.format(TELEMETRY_VERSION)
+DEFAULT_PREFIX = 'rte'
 CMDS = []
 
 
@@ -48,7 +51,28 @@ def get_app_name(pid):
 return None
 
 
-def handle_socket(path):
+def find_sockets(path):
+""" Find any possible sockets to connect to and return them """
+return glob.glob(os.path.join(path, SOCKET_NAME + '*'))
+
+
+def print_socket_options(prefix, paths):
+""" Given a set of socket paths, give the commands needed to connect """
+cmd = sys.argv[0]
+if prefix != DEFAULT_PREFIX:
+cmd += " -f " + prefix
+for s in sorted(paths):
+sock_name = os.path.basename(s)
+if sock_name.endswith(TELEMETRY_VERSION):
+print("- {}  # Connect with '{}'".format(os.path.basename(s),
+ cmd))
+else:
+print("- {}  # Connect with '{} -i {}'".format(os.path.basename(s),
+   cmd,
+   s.split(':')[-1]))
+
+
+def handle_socket(args, path):
 """ Connect to socket and handle user input """
 prompt = ''  # this evaluates to false in conditions
 sock = socket.socket(socket.AF_UNIX, socket.SOCK_SEQPACKET)
@@ -62,6 +86,15 @@ def handle_socket(path):
 except OSError:
 print("Error connecting to " + path)
 sock.close()
+# if socket exists but is bad, or if non-interactive just return
+if os.path.exists(path) or not prompt:
+return
+# if user didn't give a valid socket path, but there are
+# some sockets, help the user out by printing how to connect
+socks = find_sockets(os.path.dirname(path))
+if socks:
+print("\nOther DPDK telemetry sockets found:")
+print_socket_options(args.file_prefix, socks)
 return
 json_reply = read_socket(sock, 1024, prompt)
 output_buf_len = json_reply["max_output_len"]
@@ -110,13 +143,12 @@ def get_dpdk_runtime_dir(fp):
 readline.set_completer_delims(readline.get_completer_delims().replace('/', ''))
 
 parser = argparse.ArgumentParser()
-parser.add_argument('-f', '--file-prefix', default='rte',
+parser.add_argument('-f', '--file-prefix', default=DEFAULT_PREFIX,
 help='Provide file-prefix for DPDK runtime directory')
 parser.add_argument('-i', '--instance', default='0', type=int,
 help='Provide file-prefix for DPDK runtime directory')
 args = parser.parse_args()
-rd = get_dpdk_runtime_dir(args.file_prefix)
-sock_path = os.path.join(rd, 'dpdk_telemetry.{}'.format(TELEMETRY_VERSION))
+sock_path = os.path.join(get_dpdk_runtime_dir(args.file_prefix), SOCKET_NAME)
 if args.instance > 0:
 sock_path += ":{}".format(args.instance)
-handle_socket(sock_path)
+handle_socket(args, sock_path)
-- 
2.30.2



Re: [dpdk-dev] [PATCH v6 0/2] net: introduce IPv4 ihl and version fields

2021-10-14 Thread Ferruh Yigit

On 10/14/2021 9:30 AM, Thomas Monjalon wrote:

14/10/2021 10:21, Ferruh Yigit:

On 10/13/2021 6:13 PM, Gregory Etelson wrote:

Gregory Etelson (2):
net: fix IPv4 change announce
net: introduce IPv4 ihl and version fields



Hi Gregory,

Can you please change the order of the first and second patch?

This way I can get the first one, since it is already acked, before -rc1,
and continue reviews for second one, it will be OK since it is a
doc patch.


It makes more sense in this order I think.
The first patch is just dropping a deprecation note, I can ack.



It will be same I think, first patch can have implementation and remove
the implemented part of the deprecation notice,
remaining deprecation notice part can be removed with or without its
implementation later.

Anyway, this is for operational needs, if the first patch gets enough
ack/review timely, not update is required and I can get both patches.


Re: [dpdk-dev] [PATCH v6 1/2] net: fix IPv4 change announce

2021-10-14 Thread Ferruh Yigit

On 10/14/2021 9:37 AM, Thomas Monjalon wrote:

13/10/2021 19:13, Gregory Etelson:

IPv4 header encodes fragment information into 16 bits field.
3 bits hold flags and remaining 13 bits are for fragment offset.
13 bits bit-field cannot be defined both for big and little endian
systems.

The patch removes IPv4 fragments union announce.

Fixes: f7383e7c7ec1 ("net: announce changes in IPv4 header access")

Signed-off-by: Gregory Etelson 


OK to drop this announce.
There is no implementation anyway,
it will be back in one year if there is a solution.



If there is an option to have it back, why not keep it in the deprecation
notice, this ensures we won't forgot it.


Acked-by: Thomas Monjalon 






Re: [dpdk-dev] [PATCH v6 0/2] net: introduce IPv4 ihl and version fields

2021-10-14 Thread Ferruh Yigit

On 10/14/2021 10:29 AM, Gregory Etelson wrote:

Hello Ferruh,


On 10/13/2021 6:13 PM, Gregory Etelson wrote:

Gregory Etelson (2):
net: fix IPv4 change announce
net: introduce IPv4 ihl and version fields



Hi Gregory,

Can you please change the order of the first and
second patch?



In the existing order, the code patch reflects announced changes.



Overall you are not implementing the announced changes fully, but partially.

Question is how to manage announced but not implemented part. I am for
separating discussion of that part from the code change that already acked.


This way I can get the first one, since it is already
acked, before -rc1,
and continue reviews for second one, it will be
OK since it is a
doc patch.

Thanks,
ferruh




Re: [dpdk-dev] [PATCH v2] net/mlx5: close tools socket with the last device

2021-10-14 Thread David Marchand
On Thu, Oct 14, 2021 at 10:55 AM Dmitry Kozlyuk  wrote:
>
> MLX5 PMD exposes a socket for external tools to dump port state.
> Socket events are listened using an interrupt source of EXT type.
> The socket was closed and the interrupt callback was unregistered
> at program exit, which is incorrect because DPDK could be already
> shut down at this point. Move actions performed at program exit
> to the moment the last MLX5 port is closed. The socket will be opened
> again if later a new MLX5 device is plugged in and probed.
> Also fix comments that were deceisively talking
> about secondary processes instead of external tools.

+1 for fixing those comments.

>
> Fixes: e6cdc54cc0ef ("net/mlx5: add socket server for external tools")
> Cc: Xueming Li 
> Cc: sta...@dpdk.org
>
> Reported-by: Harman Kalra 
> Signed-off-by: Dmitry Kozlyuk 
> Acked-by: Thomas Monjalon 

The fix lgtm, thanks.

There is a separate issue I spotted while reviewing.
I'll send a separate fix.


-- 
David Marchand



Re: [dpdk-dev] [PATCH v1] vhost: add sanity check for resubmiting reqs in split ring

2021-10-14 Thread Li Feng
Thank you for your response.

On Thu, Oct 14, 2021 at 4:17 PM Maxime Coquelin
 wrote:
>
> Hi Li,
>
> Adding Jin Yu who introduced this function.
>
> On 8/27/21 07:12, Li Feng wrote:
> > When getting reqs from the avail ring, the id may exceed inflight
> > queue size. Then the dpdk will crash forever.
>
> You need to add Fixes tag and Cc sta...@dpdk.org so that it can be
> backported.
OK, I will send the v2 version.

>
> > Signed-off-by: Li Feng 
> > ---
> >   lib/vhost/vhost_user.c | 10 --
> >   1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> > index 29a4c9af60..f09d0f6a48 100644
> > --- a/lib/vhost/vhost_user.c
> > +++ b/lib/vhost/vhost_user.c
> > @@ -1823,8 +1823,14 @@ vhost_check_queue_inflights_split(struct virtio_net 
> > *dev,
> >   last_io = inflight_split->last_inflight_io;
> >
> >   if (inflight_split->used_idx != used->idx) {
> > - inflight_split->desc[last_io].inflight = 0;
> > - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> > + if (unlikely(last_io >= inflight_split->desc_num)) {
> > + VHOST_LOG_CONFIG(ERR, "last_inflight_io '%"PRIu16"' 
> > exceeds inflight "
> > + "queue size (%"PRIu16").\n", last_io,
> > + inflight_split->desc_num);
>
> If such error happens, shouldn't we return RTE_VHOST_MSG_RESULT_ERR
> instead of just logging an error?
I think ignoring the error is ok. No one could handle this error correctly.
At this time the guest virtio driver of this virtqueue may be in an
incorrect state.

>
> > + } else {
> > + inflight_split->desc[last_io].inflight = 0;
> > + rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> > + }
> >   inflight_split->used_idx = used->idx;
> >   }
> >
> >
>
> Regards,
> Maxime
>


[dpdk-dev] [PATCH v10 0/5] Add PIE support for HQoS library

2021-10-14 Thread Liguzinski, WojciechX
DPDK sched library is equipped with mechanism that secures it from the 
bufferbloat problem
which is a situation when excess buffers in the network cause high latency and 
latency 
variation. Currently, it supports RED for active queue management (which is 
designed 
to control the queue length but it does not control latency directly and is now 
being 
obsoleted). However, more advanced queue management is required to address this 
problem
and provide desirable quality of service to users.

This solution (RFC) proposes usage of new algorithm called "PIE" (Proportional 
Integral
controller Enhanced) that can effectively and directly control queuing latency 
to address 
the bufferbloat problem.

The implementation of mentioned functionality includes modification of existing 
and 
adding a new set of data structures to the library, adding PIE related APIs. 
This affects structures in public API/ABI. That is why deprecation notice is 
going
to be prepared and sent.

Liguzinski, WojciechX (5):
  sched: add PIE based congestion management
  example/qos_sched: add PIE support
  example/ip_pipeline: add PIE support
  doc/guides/prog_guide: added PIE
  app/test: add tests for PIE

 app/test/autotest_data.py|   18 +
 app/test/meson.build |4 +
 app/test/test_pie.c  | 1065 ++
 config/rte_config.h  |1 -
 doc/guides/prog_guide/glossary.rst   |3 +
 doc/guides/prog_guide/qos_framework.rst  |   60 +-
 doc/guides/prog_guide/traffic_management.rst |   13 +-
 drivers/net/softnic/rte_eth_softnic_tm.c |6 +-
 examples/ip_pipeline/tmgr.c  |  142 +--
 examples/qos_sched/app_thread.c  |1 -
 examples/qos_sched/cfg_file.c|   82 +-
 examples/qos_sched/init.c|   27 +-
 examples/qos_sched/main.h|3 +
 examples/qos_sched/profile.cfg   |  196 ++--
 lib/sched/meson.build|   10 +-
 lib/sched/rte_pie.c  |   86 ++
 lib/sched/rte_pie.h  |  398 +++
 lib/sched/rte_sched.c|  241 ++--
 lib/sched/rte_sched.h|   63 +-
 lib/sched/version.map|3 +
 20 files changed, 2154 insertions(+), 268 deletions(-)
 create mode 100644 app/test/test_pie.c
 create mode 100644 lib/sched/rte_pie.c
 create mode 100644 lib/sched/rte_pie.h

-- 
2.25.1



[dpdk-dev] [PATCH v10 1/5] sched: add PIE based congestion management

2021-10-14 Thread Liguzinski, WojciechX
Implement PIE based congestion management based on rfc8033

Signed-off-by: Liguzinski, WojciechX 
---
 drivers/net/softnic/rte_eth_softnic_tm.c |   6 +-
 lib/sched/meson.build|  10 +-
 lib/sched/rte_pie.c  |  82 +
 lib/sched/rte_pie.h  | 393 +++
 lib/sched/rte_sched.c| 241 +-
 lib/sched/rte_sched.h|  63 +++-
 lib/sched/version.map|   3 +
 7 files changed, 702 insertions(+), 96 deletions(-)
 create mode 100644 lib/sched/rte_pie.c
 create mode 100644 lib/sched/rte_pie.h

diff --git a/drivers/net/softnic/rte_eth_softnic_tm.c 
b/drivers/net/softnic/rte_eth_softnic_tm.c
index 90baba15ce..e74092ce7f 100644
--- a/drivers/net/softnic/rte_eth_softnic_tm.c
+++ b/drivers/net/softnic/rte_eth_softnic_tm.c
@@ -420,7 +420,7 @@ pmd_tm_node_type_get(struct rte_eth_dev *dev,
return 0;
 }
 
-#ifdef RTE_SCHED_RED
+#ifdef RTE_SCHED_CMAN
 #define WRED_SUPPORTED 1
 #else
 #define WRED_SUPPORTED 0
@@ -2306,7 +2306,7 @@ tm_tc_wred_profile_get(struct rte_eth_dev *dev, uint32_t 
tc_id)
return NULL;
 }
 
-#ifdef RTE_SCHED_RED
+#ifdef RTE_SCHED_CMAN
 
 static void
 wred_profiles_set(struct rte_eth_dev *dev, uint32_t subport_id)
@@ -2321,7 +2321,7 @@ wred_profiles_set(struct rte_eth_dev *dev, uint32_t 
subport_id)
for (tc_id = 0; tc_id < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; tc_id++)
for (color = RTE_COLOR_GREEN; color < RTE_COLORS; color++) {
struct rte_red_params *dst =
-   &pp->red_params[tc_id][color];
+   &pp->cman_params->red_params[tc_id][color];
struct tm_wred_profile *src_wp =
tm_tc_wred_profile_get(dev, tc_id);
struct rte_tm_red_params *src =
diff --git a/lib/sched/meson.build b/lib/sched/meson.build
index b24f7b8775..e7ae9bcf19 100644
--- a/lib/sched/meson.build
+++ b/lib/sched/meson.build
@@ -1,11 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-sources = files('rte_sched.c', 'rte_red.c', 'rte_approx.c')
-headers = files(
-'rte_approx.h',
-'rte_red.h',
-'rte_sched.h',
-'rte_sched_common.h',
-)
+sources = files('rte_sched.c', 'rte_red.c', 'rte_approx.c', 'rte_pie.c')
+headers = files('rte_sched.h', 'rte_sched_common.h',
+   'rte_red.h', 'rte_approx.h', 'rte_pie.h')
 deps += ['mbuf', 'meter']
diff --git a/lib/sched/rte_pie.c b/lib/sched/rte_pie.c
new file mode 100644
index 00..2fcecb2db4
--- /dev/null
+++ b/lib/sched/rte_pie.c
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include 
+
+#include "rte_pie.h"
+#include 
+#include 
+#include 
+
+#ifdef __INTEL_COMPILER
+#pragma warning(disable:2259) /* conversion may lose significant bits */
+#endif
+
+void
+rte_pie_rt_data_init(struct rte_pie *pie)
+{
+   if (pie == NULL) {
+   /* Allocate memory to use the PIE data structure */
+   pie = rte_malloc(NULL, sizeof(struct rte_pie), 0);
+
+   if (pie == NULL)
+   RTE_LOG(ERR, SCHED, "%s: Memory allocation fails\n", 
__func__);
+   }
+
+   pie->active = 0;
+   pie->in_measurement = 0;
+   pie->departed_bytes_count = 0;
+   pie->start_measurement = 0;
+   pie->last_measurement = 0;
+   pie->qlen = 0;
+   pie->avg_dq_time = 0;
+   pie->burst_allowance = 0;
+   pie->qdelay_old = 0;
+   pie->drop_prob = 0;
+   pie->accu_prob = 0;
+}
+
+int
+rte_pie_config_init(struct rte_pie_config *pie_cfg,
+   const uint16_t qdelay_ref,
+   const uint16_t dp_update_interval,
+   const uint16_t max_burst,
+   const uint16_t tailq_th)
+{
+   uint64_t tsc_hz = rte_get_tsc_hz();
+
+   if (pie_cfg == NULL)
+   return -1;
+
+   if (qdelay_ref <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for qdelay_ref\n", __func__);
+   return -EINVAL;
+   }
+
+   if (dp_update_interval <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for dp_update_interval\n", 
__func__);
+   return -EINVAL;
+   }
+
+   if (max_burst <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for max_burst\n", __func__);
+   return -EINVAL;
+   }
+
+   if (tailq_th <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for tailq_th\n", __func__);
+   return -EINVAL;
+   }
+
+   pie_cfg->qdelay_ref = (tsc_hz * qdelay_ref) / 1000;
+   pie_cfg->dp_update_interval = (tsc_hz * dp_update_interval) / 1000;
+   pie_cfg->max_burst = (tsc_hz * max_

[dpdk-dev] [PATCH v10 2/5] example/qos_sched: add PIE support

2021-10-14 Thread Liguzinski, WojciechX
patch add support enable PIE or RED by
parsing config file.

Signed-off-by: Liguzinski, WojciechX 
---
 config/rte_config.h |   1 -
 examples/qos_sched/app_thread.c |   1 -
 examples/qos_sched/cfg_file.c   |  82 ++---
 examples/qos_sched/init.c   |  27 +++--
 examples/qos_sched/main.h   |   3 +
 examples/qos_sched/profile.cfg  | 196 +---
 6 files changed, 216 insertions(+), 94 deletions(-)

diff --git a/config/rte_config.h b/config/rte_config.h
index 590903c07d..48132f27df 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -89,7 +89,6 @@
 #define RTE_MAX_LCORE_FREQS 64
 
 /* rte_sched defines */
-#undef RTE_SCHED_RED
 #undef RTE_SCHED_COLLECT_STATS
 #undef RTE_SCHED_SUBPORT_TC_OV
 #define RTE_SCHED_PORT_N_GRINDERS 8
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index dbc878b553..895c0d3592 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -205,7 +205,6 @@ app_worker_thread(struct thread_conf **confs)
if (likely(nb_pkt)) {
int nb_sent = rte_sched_port_enqueue(conf->sched_port, 
mbufs,
nb_pkt);
-
APP_STATS_ADD(conf->stat.nb_drop, nb_pkt - nb_sent);
APP_STATS_ADD(conf->stat.nb_rx, nb_pkt);
}
diff --git a/examples/qos_sched/cfg_file.c b/examples/qos_sched/cfg_file.c
index cd167bd8e6..5e82866dce 100644
--- a/examples/qos_sched/cfg_file.c
+++ b/examples/qos_sched/cfg_file.c
@@ -242,20 +242,20 @@ cfg_load_subport(struct rte_cfgfile *cfg, struct 
rte_sched_subport_params *subpo
memset(active_queues, 0, sizeof(active_queues));
n_active_queues = 0;
 
-#ifdef RTE_SCHED_RED
-   char sec_name[CFG_NAME_LEN];
-   struct rte_red_params 
red_params[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE][RTE_COLORS];
+#ifdef RTE_SCHED_CMAN
+   enum rte_sched_cman_mode cman_mode;
 
-   snprintf(sec_name, sizeof(sec_name), "red");
+   struct rte_red_params 
red_params[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE][RTE_COLORS];
 
-   if (rte_cfgfile_has_section(cfg, sec_name)) {
+   if (rte_cfgfile_has_section(cfg, "red")) {
+   cman_mode = RTE_SCHED_CMAN_WRED;
 
for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++) {
char str[32];
 
/* Parse WRED min thresholds */
snprintf(str, sizeof(str), "tc %d wred min", i);
-   entry = rte_cfgfile_get_entry(cfg, sec_name, str);
+   entry = rte_cfgfile_get_entry(cfg, "red", str);
if (entry) {
char *next;
/* for each packet colour (green, yellow, red) 
*/
@@ -315,7 +315,42 @@ cfg_load_subport(struct rte_cfgfile *cfg, struct 
rte_sched_subport_params *subpo
}
}
}
-#endif /* RTE_SCHED_RED */
+
+   struct rte_pie_params pie_params[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE];
+
+   if (rte_cfgfile_has_section(cfg, "pie")) {
+   cman_mode = RTE_SCHED_CMAN_PIE;
+
+   for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++) {
+   char str[32];
+
+   /* Parse Queue Delay Ref value */
+   snprintf(str, sizeof(str), "tc %d qdelay ref", i);
+   entry = rte_cfgfile_get_entry(cfg, "pie", str);
+   if (entry)
+   pie_params[i].qdelay_ref = (uint16_t) 
atoi(entry);
+
+   /* Parse Max Burst value */
+   snprintf(str, sizeof(str), "tc %d max burst", i);
+   entry = rte_cfgfile_get_entry(cfg, "pie", str);
+   if (entry)
+   pie_params[i].max_burst = (uint16_t) 
atoi(entry);
+
+   /* Parse Update Interval Value */
+   snprintf(str, sizeof(str), "tc %d update interval", i);
+   entry = rte_cfgfile_get_entry(cfg, "pie", str);
+   if (entry)
+   pie_params[i].dp_update_interval = (uint16_t) 
atoi(entry);
+
+   /* Parse Tailq Threshold Value */
+   snprintf(str, sizeof(str), "tc %d tailq th", i);
+   entry = rte_cfgfile_get_entry(cfg, "pie", str);
+   if (entry)
+   pie_params[i].tailq_th = (uint16_t) atoi(entry);
+
+   }
+   }
+#endif /* RTE_SCHED_CMAN */
 
for (i = 0; i < MAX_SCHED_SUBPORTS; i++) {
char sec_name[CFG_NAME_LEN];
@@ -393,17 +428,30 @@ cfg_load_subport(struct rte_cfgfile *cfg, struct 
rte_sched_subport_params *subpo
}
}
}
-#ifdef RTE_SC

  1   2   3   4   >