Re: rte_atomic_*_explicit

2024-01-26 Thread Mattias Rönnblom

On 2024-01-25 23:10, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Thursday, 25 January 2024 19.54

Why do rte_stdatomic.h functions have the suffix "_explicit"?
Especially
since there aren't any wrappers for the implicit variants.

More to type, more to read.


They have the "_explicit" suffix to make their names similar to those in 
stdatomic.h.



OK, so to avoid a situation where someone accidentally misinterpret
rte_atomic_fetch_add(&counter, 1, rte_memory_order_relaxed);
as what, exactly?

If you have a wrapper, why not take the opportunity and reap some 
benefits from this and fix/extend the standard API, making it a better 
fit for your application. Like removing the redundant "_explicit", 
"fetch"-less add/sub, maybe and a function for single-writer atomic add 
(non-atomic read + non-atomic add + atomic store), etc.


Prohibiting implicit operations was done already, so it's already now 
not a straight-off copy of the standard API.



You might consider their existence somewhat temporary until C11 stdatomics can 
be fully phased in, so there's another argument for similar names. (This 
probably does not happen as long as compilers generate slower code for C11 
stdatomics than with their atomic built-ins.)



To me, it seems reasonable a wrapper API should stay as long as it 
provide a benefit over the standard API. One could imagine that being a 
long time.


I imagine some DPDK developers being tired of migrating from one atomics 
API to another.  -> GCC built-ins (-> attempted C11 
stdatomics?) -> . Now you are adding a future "-> CXY 
atomics" move as well, and with that it also seems natural to add a 
change back to a wrapper or complementary API, when CXY didn't turned 
out good enough for some particular platform, or when some non-complaint 
compiler comes along.


I suggested fixing the original  API, or at least have a 
wrapper API, already at the point DPDK moved to direct GCC built-in 
calls. Then we wouldn't have had this atomics API ping-pong.




When was this API introduced? Shouldn't it say "experimental"
somewhere?


They were introduced as part of the migration to C11.
I suppose they were not marked experimental because they replaced something we 
didn't want anymore (the compiler built-ins for atomics, e.g. 
__atomic_load_n()). I don't recall if we discussed experimental marking or not.







Reverse paper trail:
https://git.dpdk.org/dpdk/log/lib/eal/include/rte_stdatomic.h
https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git-send-email-roret...@linux.microsoft.com/
https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git-send-email-roret...@linux.microsoft.com/



Re: rte_atomic_*_explicit

2024-01-26 Thread Mattias Rönnblom

On 2024-01-26 02:37, Honnappa Nagarahalli wrote:





On Thu, Jan 25, 2024 at 11:10:47PM +0100, Morten Br�rup wrote:

From: Mattias R�nnblom [mailto:hof...@lysator.liu.se]
Sent: Thursday, 25 January 2024 19.54

Why do rte_stdatomic.h functions have the suffix "_explicit"?
Especially
since there aren't any wrappers for the implicit variants.

More to type, more to read.


They have the "_explicit" suffix to make their names similar to those in

stdatomic.h.


You might consider their existence somewhat temporary until C11 stdatomics

can be fully phased in, so there's another argument for similar names. (This
probably does not happen as long as compilers generate slower code for C11
stdatomics than with their atomic built-ins.)

yes, there was feedback at the time it was.

* we should *not* have non-explicit versions of the macros
* the atomic generic functions should be named to match C11 standard
   with a rte_ prefix.

This was mainly done to ensure that users think through the memory ordering 
they want to use. This also matches with the compiler atomic built-ins. Without 
explicit, it is sequentially consistent memory order.



"This" is referring to the first bullet only, correct?

You don't have to distinguish between implicit and explicit if you only 
have explicit.








When was this API introduced? Shouldn't it say "experimental"
somewhere?


They were introduced as part of the migration to C11.
I suppose they were not marked experimental because they replaced

something we didn't want anymore (the compiler built-ins for atomics, e.g.
__atomic_load_n()). I don't recall if we discussed experimental marking or not.

i don't think we discussed it since they're wrapper macros.




Reverse paper trail:
https://git.dpdk.org/dpdk/log/lib/eal/include/rte_stdatomic.h
https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git-

send-email-roret...@linux.microsoft.com/

https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git-

send-email-roret...@linux.microsoft.com/




RE: [PATCH 0/8] optimize the firmware loading process

2024-01-26 Thread Chaoyong He
> On 1/25/2024 2:06 AM, Chaoyong He wrote:
> >> On 1/15/2024 2:54 AM, Chaoyong He wrote:
> >>> This patch series aims to speedup the DPDK application start by
> >>> optimize the firmware loading process in sereval places.
> >>> We also simplify the port name in multiple PF firmware case to
> >>> make the customer happy.
> >>>
> >>
> >> <...>
> >>
> >>>   net/nfp: add the elf module
> >>>   net/nfp: reload the firmware only when firmware changed
> >>
> >> Above commit adds elf parser capability and second one loads
> >> firmware when build time is different.
> >>
> >> I can see this is an optimization effort, to understand FW status
> >> before loading FW, but relying on build time seems fragile. Does
> >> it help to add a new section to store version information and
> >> evaluate based
>  on this information?
> >>
> >
> > We have a branch of firmware (several app type combined with
> > NFD3/NFDk) with the same version information(monthly publish), so
> > the
>  version information can't help us, because we can't distinguish
>  among
> >> them.
> >
> > But the build time is different for every firmware, and that's the
> > reason we
>  choose it.
> >
> 
>  If version is same although FW itself is different, isn't this
>  problem on its
> >> own?
>  Perhaps an additional field is required in version syntax.
> >>>
> >>> Actually, it's just the role the build time which embed in the
> >>> firmware plays,
> >> which is unique for every firmware, and which can't be modified once
> >> the firmware was published.
> >>>
> >>
> >> I see it is already in the elf binary but relying on build time of a
> >> FW to decide to load it or not looks a weak method to me, and fragile as
> stated before.
> >>
> >>> What you said also has its mean, but in practice we can't accept(at
> >>> least can't
> >> do it immediately), which needs to discuss with firmware team.
> >>>
> >>
> >> As it is an optimization I assume it is not urgent, so would you mind
> >> discussing the issue with the FW team, perhaps it can lead to a
> >> better solution, we can proceed after that.
> >
> > After discussing with the FW team, seems we still can't just use version
> because our firmware are published monthly.
> >
> > Between two public version, we also have internal versions like 'rc0, rc1'
> scheme just as DPDK.
> > But the problem is, we may compile and share many firmware in daily
> development.
> > They are maybe only valid for one time and will never publish, so we won't
> assign a version for them.
> >
> 
> So is my understanding correct that this effort is not for customers and not 
> for
> published FWs, but mostly for developers with development FWs.
> 
> If so perhaps it can be simplified even more, there can be a devarg that force
> loads the FW, for development scenario.
> By default, driver checks the FW version and loads according, but if flag is 
> set it
> loads the FW even with same version. When a new development FW is out,
> developer can run DPDK app with this parameter and it loads the FW to test,
> does this make sense.

Yeah, it's a perfect method, thanks.

> 
> I am not trying to be difficult, just the proposed approach didn't satisfy me 
> I
> am trying to understand it and help if I can. And if you please continue
> discussion in the mailing list it enables others to comment and perhaps
> provide a better option.
> 
> 
> > So the build time is the only information we can distinguish them, if just
> using version, PMD will not reload the firmware which needs to test and
> debug.
> >
> > How about we first using version to check, and then using build time to
> check for the same version?
> >
> 
> Nope, I though elf parsing and relying on build time can be removed, if you
> have them anyway, no need to add the version check, can use just the build
> time to check.
> 
> 
> And another related question, if developers are using daily FWs with exact
> same version, when a developer had an unexpected behavior, how she will
> know exactly which version of the FW (in the device), this maybe required to
> track down the problem. Shouldn't each FW have a unique identifier, except
> the "build time" info  which is stored in the elf binary not FW itself?

We will use the devarg and version as your advice, and remove the build time 
logics.

> 
> >>
> >> Meanwhile I will continue with remaining patches, excluding these two
> >> patches.
> >>
> >>> If you insist that we should change the design, maybe we can just
> >>> kick off
> >> two commits, and upstream the other commits?
> >>> - net/nfp: add the elf module
> >>> - net/nfp: reload the firmware only when firmware changed
> >



Re: [RFC] service: extend service function call statistics

2024-01-26 Thread Mattias Rönnblom

On 2024-01-26 00:19, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
Sent: Thursday, 25 January 2024 20.15

Add two new per-service counters.

RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service function
invocations where no work was performed.

RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
resulting in an error.

The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
counting all invocations, regardless of return value).

The new statistics may be useful for both debugging and profiling
(e.g., calculate the average per-call processing latency for non-idle
service calls).

Service core tests are extended to cover the new counters, and
coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.


OK to all of the above. Good stuff.



The documentation for the CYCLES attributes are updated to reflect
their actual semantics.


If this is intended behavior, then updating the documentation seems appropriate 
- I would even go so far as considering it a bug fix.

However, quite a few cycles may be consumed by a service before it can conclude 
that it had no work to do. Shouldn't that be considered time spent by the 
service? I.e. should the code be fixed instead of the documentation?



Generally, polling for new work in the service is cheap, in my 
experience. But there's nothing in principle that prevents the situation 
your describe from occurring. You could add an "IDLE_CYCLES" counter in 
case that would ever be a real-world problem.


That wouldn't be a fix, but rather just returning to the old, subtly 
broken, (pre-22.11?) semantics.


Have a look at 809bd24 to see the rationale for the change. There's an 
example in 4689c57.


The cause of this ambiguity is due to the fact that the platform/OS 
(i.e., DPDK) doesn't know which service to "wake up" (which in turn is 
the result of the platform not being able to tracking input sources, 
like NIC RX queues or timer wheels) and thus must ask every service to 
check if it has something to do.



Alternatively, keep the behavior (for backwards compatibility) and fix the 
documentation, as this patch does, and add an IDLE_CYCLES counter for time 
spent in idle calls.

PS: We're not using DPDK service cores in our applications, so I'm not familiar 
with the details. We are using something somewhat similar (but homegrown), also 
for profiling and power management purposes, and my feedback is based on my 
experience with our own variant of service cores.



When are you making the switch to service cores? :)


Either way:

Acked-by: Morten Brørup 



Thanks for the review.


[PATCH 0/2] dma/skeleton: add support for SG copy and fill ops

2024-01-26 Thread Chengwen Feng
This patchset adds support for SG copy and fill ops. And test passed by
[1].

[1] 
https://patchwork.dpdk.org/project/dpdk/cover/cover.1700156485.git.gmuthukri...@marvell.com/

Chengwen Feng (2):
  dma/skeleton: support SG copy ops
  dma/skeleton: support fill ops

 drivers/dma/skeleton/skeleton_dmadev.c | 147 +++--
 drivers/dma/skeleton/skeleton_dmadev.h |  44 ++--
 2 files changed, 171 insertions(+), 20 deletions(-)

-- 
2.17.1



[PATCH 2/2] dma/skeleton: support fill ops

2024-01-26 Thread Chengwen Feng
Add support for fill operation.

Signed-off-by: Chengwen Feng 
---
 drivers/dma/skeleton/skeleton_dmadev.c | 53 +++---
 drivers/dma/skeleton/skeleton_dmadev.h | 16 +---
 2 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/drivers/dma/skeleton/skeleton_dmadev.c 
b/drivers/dma/skeleton/skeleton_dmadev.c
index d1d257a064..48f88f9fc1 100644
--- a/drivers/dma/skeleton/skeleton_dmadev.c
+++ b/drivers/dma/skeleton/skeleton_dmadev.c
@@ -38,7 +38,8 @@ skeldma_info_get(const struct rte_dma_dev *dev, struct 
rte_dma_info *dev_info,
dev_info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM |
 RTE_DMA_CAPA_SVA |
 RTE_DMA_CAPA_OPS_COPY |
-RTE_DMA_CAPA_OPS_COPY_SG;
+RTE_DMA_CAPA_OPS_COPY_SG |
+RTE_DMA_CAPA_OPS_FILL;
dev_info->max_vchans = 1;
dev_info->max_desc = SKELDMA_MAX_DESC;
dev_info->min_desc = SKELDMA_MIN_DESC;
@@ -100,8 +101,19 @@ do_copy_sg(struct skeldma_desc *desc)
}
 }
 
+static inline void
+do_fill(struct skeldma_desc *desc)
+{
+   uint8_t *fills = (uint8_t *)&desc->fill.pattern;
+   uint8_t *dst = (uint8_t *)desc->fill.dst;
+   uint32_t i;
+
+   for (i = 0; i < desc->fill.len; i++)
+   dst[i] = fills[i % 8];
+}
+
 static uint32_t
-cpucopy_thread(void *param)
+cpuwork_thread(void *param)
 {
 #define SLEEP_THRESHOLD1
 #define SLEEP_US_VAL   10
@@ -127,6 +139,8 @@ cpucopy_thread(void *param)
rte_memcpy(desc->copy.dst, desc->copy.src, 
desc->copy.len);
else if (desc->op == SKELDMA_OP_COPY_SG)
do_copy_sg(desc);
+   else if (desc->op == SKELDMA_OP_FILL)
+   do_fill(desc);
 
__atomic_fetch_add(&hw->completed_count, 1, __ATOMIC_RELEASE);
(void)rte_ring_enqueue(hw->desc_completed, (void *)desc);
@@ -162,7 +176,7 @@ skeldma_start(struct rte_dma_dev *dev)
 * 1) fflush pending/running/completed ring to empty ring.
 * 2) init ring idx to zero.
 * 3) init running statistics.
-* 4) mark cpucopy task exit_flag to false.
+* 4) mark cpuwork task exit_flag to false.
 */
fflush_ring(hw, hw->desc_pending);
fflush_ring(hw, hw->desc_running);
@@ -178,9 +192,9 @@ skeldma_start(struct rte_dma_dev *dev)
 
snprintf(name, sizeof(name), "dma-skel%d", dev->data->dev_id);
ret = rte_thread_create_internal_control(&hw->thread, name,
-   cpucopy_thread, dev);
+   cpuwork_thread, dev);
if (ret) {
-   SKELDMA_LOG(ERR, "Start cpucopy thread fail!");
+   SKELDMA_LOG(ERR, "Start cpuwork thread fail!");
return -EINVAL;
}
 
@@ -462,6 +476,34 @@ skeldma_copy_sg(void *dev_private, uint16_t vchan,
return hw->ridx++;
 }
 
+static int
+skeldma_fill(void *dev_private, uint16_t vchan,
+uint64_t pattern, rte_iova_t dst,
+uint32_t length, uint64_t flags)
+{
+   struct skeldma_hw *hw = dev_private;
+   struct skeldma_desc *desc;
+   int ret;
+
+   RTE_SET_USED(vchan);
+
+   ret = rte_ring_dequeue(hw->desc_empty, (void **)&desc);
+   if (ret)
+   return -ENOSPC;
+   desc->op = SKELDMA_OP_FILL;
+   desc->ridx = hw->ridx;
+   desc->fill.dst = (void *)(uintptr_t)dst;
+   desc->fill.len = length;
+   desc->fill.pattern = pattern;
+   if (flags & RTE_DMA_OP_FLAG_SUBMIT)
+   submit(hw, desc);
+   else
+   (void)rte_ring_enqueue(hw->desc_pending, (void *)desc);
+   hw->submitted_count++;
+
+   return hw->ridx++;
+}
+
 static int
 skeldma_submit(void *dev_private, uint16_t vchan)
 {
@@ -573,6 +615,7 @@ skeldma_create(const char *name, struct rte_vdev_device 
*vdev, int lcore_id)
dev->fp_obj->dev_private = dev->data->dev_private;
dev->fp_obj->copy = skeldma_copy;
dev->fp_obj->copy_sg = skeldma_copy_sg;
+   dev->fp_obj->fill = skeldma_fill;
dev->fp_obj->submit = skeldma_submit;
dev->fp_obj->completed = skeldma_completed;
dev->fp_obj->completed_status = skeldma_completed_status;
diff --git a/drivers/dma/skeleton/skeleton_dmadev.h 
b/drivers/dma/skeleton/skeleton_dmadev.h
index 7d32dd5095..c9bf3153ba 100644
--- a/drivers/dma/skeleton/skeleton_dmadev.h
+++ b/drivers/dma/skeleton/skeleton_dmadev.h
@@ -16,6 +16,7 @@
 enum skeldma_op {
SKELDMA_OP_COPY,
SKELDMA_OP_COPY_SG,
+   SKELDMA_OP_FILL,
 };
 
 struct skeldma_desc {
@@ -34,14 +35,19 @@ struct skeldma_desc {
uint16_t nb_src;
uint16_t nb_dst;
} copy_sg;
+   struct {
+   void *dst;
+   uint32_t len;
+   uint64_t pattern;
+   } 

[PATCH 1/2] dma/skeleton: support SG copy ops

2024-01-26 Thread Chengwen Feng
Add support scatter gather copy.

Signed-off-by: Chengwen Feng 
---
 drivers/dma/skeleton/skeleton_dmadev.c | 96 --
 drivers/dma/skeleton/skeleton_dmadev.h | 28 ++--
 2 files changed, 113 insertions(+), 11 deletions(-)

diff --git a/drivers/dma/skeleton/skeleton_dmadev.c 
b/drivers/dma/skeleton/skeleton_dmadev.c
index eab03852dd..d1d257a064 100644
--- a/drivers/dma/skeleton/skeleton_dmadev.c
+++ b/drivers/dma/skeleton/skeleton_dmadev.c
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2021 HiSilicon Limited
+ * Copyright(c) 2021-2024 HiSilicon Limited
  */
 
 #include 
@@ -37,10 +37,12 @@ skeldma_info_get(const struct rte_dma_dev *dev, struct 
rte_dma_info *dev_info,
 
dev_info->dev_capa = RTE_DMA_CAPA_MEM_TO_MEM |
 RTE_DMA_CAPA_SVA |
-RTE_DMA_CAPA_OPS_COPY;
+RTE_DMA_CAPA_OPS_COPY |
+RTE_DMA_CAPA_OPS_COPY_SG;
dev_info->max_vchans = 1;
dev_info->max_desc = SKELDMA_MAX_DESC;
dev_info->min_desc = SKELDMA_MIN_DESC;
+   dev_info->max_sges = SKELDMA_MAX_SGES;
 
return 0;
 }
@@ -55,6 +57,49 @@ skeldma_configure(struct rte_dma_dev *dev, const struct 
rte_dma_conf *conf,
return 0;
 }
 
+static inline void
+do_copy_sg_one(struct rte_dma_sge *src, struct rte_dma_sge *dst, uint16_t 
nb_dst, uint64_t offset)
+{
+   uint32_t src_off = 0, dst_off = 0;
+   uint32_t copy_len = 0;
+   uint64_t tmp = 0;
+   uint16_t i;
+
+   /* Locate the segment from which the copy is started. */
+   for (i = 0; i < nb_dst; i++) {
+   tmp += dst[i].length;
+   if (offset < tmp) {
+   copy_len = tmp - offset;
+   dst_off = dst[i].length - copy_len;
+   break;
+   }
+   }
+
+   for (/* Use the above index */; i < nb_dst; i++, copy_len = 
dst[i].length) {
+   copy_len = RTE_MIN(copy_len, src->length - src_off);
+   rte_memcpy((uint8_t *)(uintptr_t)dst[i].addr + dst_off,
+  (uint8_t *)(uintptr_t)src->addr + src_off,
+  copy_len);
+   src_off += copy_len;
+   if (src_off >= src->length)
+   break;
+   dst_off = 0;
+   }
+}
+
+static inline void
+do_copy_sg(struct skeldma_desc *desc)
+{
+   uint64_t offset = 0;
+   uint16_t i;
+
+   for (i = 0; i < desc->copy_sg.nb_src; i++) {
+   do_copy_sg_one(&desc->copy_sg.src[i], desc->copy_sg.dst,
+  desc->copy_sg.nb_dst, offset);
+   offset += desc->copy_sg.src[i].length;
+   }
+}
+
 static uint32_t
 cpucopy_thread(void *param)
 {
@@ -76,9 +121,13 @@ cpucopy_thread(void *param)
rte_delay_us_sleep(SLEEP_US_VAL);
continue;
}
-
hw->zero_req_count = 0;
-   rte_memcpy(desc->dst, desc->src, desc->len);
+
+   if (desc->op == SKELDMA_OP_COPY)
+   rte_memcpy(desc->copy.dst, desc->copy.src, 
desc->copy.len);
+   else if (desc->op == SKELDMA_OP_COPY_SG)
+   do_copy_sg(desc);
+
__atomic_fetch_add(&hw->completed_count, 1, __ATOMIC_RELEASE);
(void)rte_ring_enqueue(hw->desc_completed, (void *)desc);
}
@@ -368,10 +417,42 @@ skeldma_copy(void *dev_private, uint16_t vchan,
ret = rte_ring_dequeue(hw->desc_empty, (void **)&desc);
if (ret)
return -ENOSPC;
-   desc->src = (void *)(uintptr_t)src;
-   desc->dst = (void *)(uintptr_t)dst;
-   desc->len = length;
+   desc->op = SKELDMA_OP_COPY;
+   desc->ridx = hw->ridx;
+   desc->copy.src = (void *)(uintptr_t)src;
+   desc->copy.dst = (void *)(uintptr_t)dst;
+   desc->copy.len = length;
+   if (flags & RTE_DMA_OP_FLAG_SUBMIT)
+   submit(hw, desc);
+   else
+   (void)rte_ring_enqueue(hw->desc_pending, (void *)desc);
+   hw->submitted_count++;
+
+   return hw->ridx++;
+}
+
+static int
+skeldma_copy_sg(void *dev_private, uint16_t vchan,
+   const struct rte_dma_sge *src,
+   const struct rte_dma_sge *dst,
+   uint16_t nb_src, uint16_t nb_dst,
+   uint64_t flags)
+{
+   struct skeldma_hw *hw = dev_private;
+   struct skeldma_desc *desc;
+   int ret;
+
+   RTE_SET_USED(vchan);
+
+   ret = rte_ring_dequeue(hw->desc_empty, (void **)&desc);
+   if (ret)
+   return -ENOSPC;
+   desc->op = SKELDMA_OP_COPY_SG;
desc->ridx = hw->ridx;
+   memcpy(desc->copy_sg.src, src, sizeof(*src) * nb_src);
+   memcpy(desc->copy_sg.dst, dst, sizeof(*dst) * nb_dst);
+   desc->copy_sg.nb_src = nb_src;
+   desc->copy_sg.nb_dst = nb_dst;
if (flags & RTE_DMA_O

Re: [PATCH] RFC: use C11 alignas instead of GCC attribute aligned

2024-01-26 Thread Mattias Rönnblom

On 2024-01-25 23:53, Morten Brørup wrote:

From: Tyler Retzlaff [mailto:roret...@linux.microsoft.com]
Sent: Thursday, 25 January 2024 19.37

ping.

Please review this thread if you have time, the main point of
discussion
I would like to receive consensus on the following questions.

1. Should we continue to expand common alignments behind an __rte_macro

   i.e. what do we prefer to appear in code

   alignas(RTE_CACHE_LINE_MIN_SIZE)

   -- or --

   __rte_cache_aligned

One of the benefits of dropping the macro is it provides a clear visual
indicator that it is not placed in the same location or get applied
to types as is done with __attribute__((__aligned__(n))).


We don't want our own proprietary variant of something that already exists in 
the C standard. Now that we have moved to C11, the __rte alignment macros 
should be considered obsolete.


Making so something cache-line aligned is not in C11.

__rte_cache_aligned is shorter, provides a tiny bit of abstraction, and 
is already an established DPDK standard. So just keep the macro. If it 
would change, I would argue for it to be changed to rte_cache_aligned 
(i.e., just moving it out of __ namespace, and maybe making it 
all-uppercase).


Non-trivial C programs wrap things all the time, standard or not. It's 
not something to be overly concerned about, imo.




Note: I don't mind convenience macros for common use cases, so we could also 
introduce the macro suggested by Mattias [1]:

#define RTE_CACHE_ALIGNAS alignas(RTE_CACHE_LINE_SIZE)

[1]: 
https://inbox.dpdk.org/dev/dc3f3131-38e6-4219-861e-b31ec10c0...@lysator.liu.se/



2. where should we place alignas(n) or __rte_macro (if we use a macro)

Should it be on the same line as the variable or field or on the
preceeding line?

   /* same line example struct */
   struct T {
   /* alignas(64) applies to field0 *not* struct T type declaration
*/
   alignas(64) void *field0;
   void *field1;

   ... other fields ...

   alignas(64) uint64_t field5;
   uint32_t field6;

   ... more fields ...

   };

   /* same line example array */
   alignas(64) static const uint32_t array[4] = { ... };

   -- or --

   /* preceeding line example struct */
   struct T {
   /* alignas(64) applies to field0 *not* struct T type declaration
*/
   alignas(64)
   void *field0;
   void *field1;

   ... other fields ...

   alignas(64)
   uint64_t field5;
   uint32_t field6;

   ... more fields ...

   };

   /* preceeding line example array */
   alignas(64)
   static const uint32_t array[4] = { ... };



Searching the net for what other projects do, I came across this required 
placement [2]:

uint64_t alignas(64) field5;

[2]: https://lore.kernel.org/buildroot/2023073851.6faa3391@windsurf/T/

So let's follow the standard's intention and put them on the same line.
On an case-by-case basis, we can wrap lines if it improves readability, like we 
do with function headers that have a lot of attributes.



I'll submit patches for lib/* once the discussion is concluded.

thanks folks




RE: [RFC] service: extend service function call statistics

2024-01-26 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> Sent: Friday, 26 January 2024 09.28
> 
> On 2024-01-26 00:19, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
> >> Sent: Thursday, 25 January 2024 20.15
> >>
> >> Add two new per-service counters.
> >>
> >> RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service
> function
> >> invocations where no work was performed.
> >>
> >> RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
> >> resulting in an error.
> >>
> >> The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
> >> counting all invocations, regardless of return value).
> >>
> >> The new statistics may be useful for both debugging and profiling
> >> (e.g., calculate the average per-call processing latency for non-
> idle
> >> service calls).
> >>
> >> Service core tests are extended to cover the new counters, and
> >> coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.
> >
> > OK to all of the above. Good stuff.
> >
> >>
> >> The documentation for the CYCLES attributes are updated to reflect
> >> their actual semantics.
> >
> > If this is intended behavior, then updating the documentation seems
> appropriate - I would even go so far as considering it a bug fix.
> >
> > However, quite a few cycles may be consumed by a service before it
> can conclude that it had no work to do. Shouldn't that be considered
> time spent by the service? I.e. should the code be fixed instead of the
> documentation?
> >
> 
> Generally, polling for new work in the service is cheap, in my
> experience. But there's nothing in principle that prevents the
> situation
> your describe from occurring. You could add an "IDLE_CYCLES" counter in
> case that would ever be a real-world problem.
> 
> That wouldn't be a fix, but rather just returning to the old, subtly
> broken, (pre-22.11?) semantics.
> 
> Have a look at 809bd24 to see the rationale for the change. There's an
> example in 4689c57.
> 
> The cause of this ambiguity is due to the fact that the platform/OS
> (i.e., DPDK) doesn't know which service to "wake up" (which in turn is
> the result of the platform not being able to tracking input sources,
> like NIC RX queues or timer wheels) and thus must ask every service to
> check if it has something to do.

OK. Makes good sense.
So definitely fix the documentation, not the code. :-)

> 
> > Alternatively, keep the behavior (for backwards compatibility) and
> fix the documentation, as this patch does, and add an IDLE_CYCLES
> counter for time spent in idle calls.
> >
> > PS: We're not using DPDK service cores in our applications, so I'm
> not familiar with the details. We are using something somewhat similar
> (but homegrown), also for profiling and power management purposes, and
> my feedback is based on my experience with our own variant of service
> cores.
> >
> 
> When are you making the switch to service cores? :)

Our own "service cores" implementation has some slightly different properties, 
which we are still experimenting with.

E.g. in addition to the special return value "idle (no work to do)", we also 
have a special return value for "incomplete (more work urgently pending)" when 
a service processed a full burst and still has more work pending its input 
queue.

We are also considering returning a value to inform what time it needs to be 
called again. This concept is only an idea, and we haven't started 
experimenting with it yet.


From a high level perspective, the service cores library is much like an 
operating system's CPU scheduler, although based on voluntary time sharing. 
Many algorithms and many parameters can be considered. It can also tie into 
power management and prioritization of different tasks.

> 
> > Either way:
> >
> > Acked-by: Morten Brørup 
> >
> 
> Thanks for the review.



RE: [PATCH v2 6/8] common/mlx5: fix calloc parameters

2024-01-26 Thread Dariusz Sosnowski
> -Original Message-
> From: Morten Brørup 
> Sent: Wednesday, January 24, 2024 20:01
> To: Ferruh Yigit ; Dariusz Sosnowski
> ; Slava Ovsiienko ; Ori
> Kam ; Suanming Mou ; Matan
> Azrad ; Dmitry Kozlyuk 
> Cc: dev@dpdk.org; sta...@dpdk.org; dkozl...@nvidia.com
> Subject: RE: [PATCH v2 6/8] common/mlx5: fix calloc parameters 
> 
> > From: Ferruh Yigit [mailto:ferruh.yi...@amd.com]
> > Sent: Wednesday, 24 January 2024 19.54
> >
> > gcc [1] generates warning [2] about calloc usage, because calloc
> > parameter order is wrong, fixing it by replacing parameters.
> >
> > [1]
> > gcc (GCC) 14.0.1 20240124 (experimental)
> >
> > [2]
> > Compiling C object .../common_mlx5_mlx5_common_mr.c.o
> > .../mlx5/mlx5_common_mr.c: In function ‘mlx5_mempool_get_chunks’:
> > .../common/mlx5/mlx5_common_mr.c:1384:29:
> >   warning: ‘calloc’ sizes specified with ‘sizeof’ in the earlier
> >   argument and not in the later argument [-Wcalloc-transposed-args]
> >  1384 | *out = calloc(sizeof(**out), n);
> >   | ^
> >
> > Fixes: 7297d2cdecce ("common/mlx5: fix external memory pool
> > registration")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Ferruh Yigit 
> > ---
> 
> Acked-by: Morten Brørup 

Acked-by: Dariusz Sosnowski 



RE: [PATCH] RFC: use C11 alignas instead of GCC attribute aligned

2024-01-26 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> Sent: Friday, 26 January 2024 11.05
> 
> On 2024-01-25 23:53, Morten Brørup wrote:
> >> From: Tyler Retzlaff [mailto:roret...@linux.microsoft.com]
> >> Sent: Thursday, 25 January 2024 19.37
> >>
> >> ping.
> >>
> >> Please review this thread if you have time, the main point of
> >> discussion
> >> I would like to receive consensus on the following questions.
> >>
> >> 1. Should we continue to expand common alignments behind an
> __rte_macro
> >>
> >>i.e. what do we prefer to appear in code
> >>
> >>alignas(RTE_CACHE_LINE_MIN_SIZE)
> >>
> >>-- or --
> >>
> >>__rte_cache_aligned
> >>
> >> One of the benefits of dropping the macro is it provides a clear
> visual
> >> indicator that it is not placed in the same location or get applied
> >> to types as is done with __attribute__((__aligned__(n))).
> >
> > We don't want our own proprietary variant of something that already
> exists in the C standard. Now that we have moved to C11, the __rte
> alignment macros should be considered obsolete.
> 
> Making so something cache-line aligned is not in C11.

We are talking about the __rte_aligned() macro, not the cache alignment macro.

> 
> __rte_cache_aligned is shorter, provides a tiny bit of abstraction, and
> is already an established DPDK standard. So just keep the macro. If it
> would change, I would argue for it to be changed to rte_cache_aligned
> (i.e., just moving it out of __ namespace, and maybe making it
> all-uppercase).
> 
> Non-trivial C programs wrap things all the time, standard or not. It's
> not something to be overly concerned about, imo.

Using the cache alignment macro was obviously a bad example for discussing the 
__rte_aligned() macro.

FYI, Tyler later agreed to introducing the RTE_CACHE_ALIGNAS you had proposed 
in an earlier correspondence.

> 
> >
> > Note: I don't mind convenience macros for common use cases, so we
> could also introduce the macro suggested by Mattias [1]:
> >
> > #define RTE_CACHE_ALIGNAS alignas(RTE_CACHE_LINE_SIZE)
> >
> > [1]: https://inbox.dpdk.org/dev/dc3f3131-38e6-4219-861e-
> b31ec10c0...@lysator.liu.se/
> >
> >>
> >> 2. where should we place alignas(n) or __rte_macro (if we use a
> macro)
> >>
> >> Should it be on the same line as the variable or field or on the
> >> preceeding line?
> >>
> >>/* same line example struct */
> >>struct T {
> >>/* alignas(64) applies to field0 *not* struct T type
> declaration
> >> */
> >>alignas(64) void *field0;
> >>void *field1;
> >>
> >>... other fields ...
> >>
> >>alignas(64) uint64_t field5;
> >>uint32_t field6;
> >>
> >>... more fields ...
> >>
> >>};
> >>
> >>/* same line example array */
> >>alignas(64) static const uint32_t array[4] = { ... };
> >>
> >>-- or --
> >>
> >>/* preceeding line example struct */
> >>struct T {
> >>/* alignas(64) applies to field0 *not* struct T type
> declaration
> >> */
> >>alignas(64)
> >>void *field0;
> >>void *field1;
> >>
> >>... other fields ...
> >>
> >>alignas(64)
> >>uint64_t field5;
> >>uint32_t field6;
> >>
> >>... more fields ...
> >>
> >>};
> >>
> >>/* preceeding line example array */
> >>alignas(64)
> >>static const uint32_t array[4] = { ... };
> >>
> >
> > Searching the net for what other projects do, I came across this
> required placement [2]:
> >
> > uint64_t alignas(64) field5;
> >
> > [2]:
> https://lore.kernel.org/buildroot/2023073851.6faa3391@windsurf/T/
> >
> > So let's follow the standard's intention and put them on the same
> line.
> > On an case-by-case basis, we can wrap lines if it improves
> readability, like we do with function headers that have a lot of
> attributes.
> >
> >>
> >> I'll submit patches for lib/* once the discussion is concluded.
> >>
> >> thanks folks
> >


RE: rte_atomic_*_explicit

2024-01-26 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> Sent: Friday, 26 January 2024 09.07
> 
> On 2024-01-25 23:10, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> >> Sent: Thursday, 25 January 2024 19.54
> >>
> >> Why do rte_stdatomic.h functions have the suffix "_explicit"?
> >> Especially
> >> since there aren't any wrappers for the implicit variants.
> >>
> >> More to type, more to read.
> >
> > They have the "_explicit" suffix to make their names similar to those
> in stdatomic.h.
> >
> 
> OK, so to avoid a situation where someone accidentally misinterpret
> rte_atomic_fetch_add(&counter, 1, rte_memory_order_relaxed);
> as what, exactly?
> 
> If you have a wrapper, why not take the opportunity and reap some
> benefits from this and fix/extend the standard API, making it a better
> fit for your application. Like removing the redundant "_explicit",
> "fetch"-less add/sub, maybe and a function for single-writer atomic add
> (non-atomic read + non-atomic add + atomic store), etc.
> 
> Prohibiting implicit operations was done already, so it's already now
> not a straight-off copy of the standard API.
> 
> > You might consider their existence somewhat temporary until C11
> stdatomics can be fully phased in, so there's another argument for
> similar names. (This probably does not happen as long as compilers
> generate slower code for C11 stdatomics than with their atomic built-
> ins.)
> >
> 
> To me, it seems reasonable a wrapper API should stay as long as it
> provide a benefit over the standard API. One could imagine that being a
> long time.
> 
> I imagine some DPDK developers being tired of migrating from one
> atomics
> API to another.  -> GCC built-ins (-> attempted C11
> stdatomics?) -> . Now you are adding a future "-> CXY
> atomics" move as well, and with that it also seems natural to add a
> change back to a wrapper or complementary API, when CXY didn't turned
> out good enough for some particular platform, or when some non-
> complaint
> compiler comes along.

Yes, more migrations seem to be on the roadmap.

We can take the opportunity to change direction now, and decide to keep the 
 API long term.
Then it would need more documentation (basically copying function descriptions 
from ), and the functions could have the "_explicit" suffix 
removed (macros with the suffix could be added for backwards compatibility), 
and more functions - like the ones you suggested above - could be added.

What do people think?
1. Keep considering  a temporary wrapper for  
until compilers reach some undefined level of maturity, or
2. Consider  stable, clean it up (remove "_explicit" suffix), 
add documentation to the macros, and extend it.

Living in a DPDK-only environment, I would prefer option 2; but if mixing DPDK 
code with non-DPDK code (that uses ) it might be weird.

> 
> I suggested fixing the original  API, or at least have a
> wrapper API, already at the point DPDK moved to direct GCC built-in
> calls. Then we wouldn't have had this atomics API ping-pong.

The decision back then might have been too hasty, and based on incomplete 
assumptions.
Please shout louder next time you think a mistake is in the making.

> 
> >>
> >> When was this API introduced? Shouldn't it say "experimental"
> >> somewhere?
> >
> > They were introduced as part of the migration to C11.
> > I suppose they were not marked experimental because they replaced
> something we didn't want anymore (the compiler built-ins for atomics,
> e.g. __atomic_load_n()). I don't recall if we discussed experimental
> marking or not.

In hindsight, they should probably have been marked "experimental".



[PATCH] app/testpmd: support updating flow rule actions

2024-01-26 Thread Oleksandr Kolomeiets
"flow actions_update" updates a flow rule specified by a rule ID with a
new action list by making a call to "rte_flow_actions_update()":

flow actions_update {port_id} {rule_id}
actions {action} [/ {action} [...]] / end [user_id]

Creating, updating and destroying a flow rule:

testpmd> flow create 0 group 1 pattern eth / end actions drop / end
Flow rule #0 created
testpmd> flow actions_update 0 0 actions queue index 1 / end
Flow rule #0 updated with new actions
testpmd> flow destroy 0 rule 0
Flow rule #0 destroyed

Signed-off-by: Oleksandr Kolomeiets 
Reviewed-by: Mykola Kostenok 
Reviewed-by: Christian Koue Muf 
---
 .mailmap|  3 ++
 app/test-pmd/cmdline.c  |  4 ++
 app/test-pmd/cmdline_flow.c | 34 +++-
 app/test-pmd/config.c   | 43 
 app/test-pmd/testpmd.h  |  3 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 44 +
 6 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/.mailmap b/.mailmap
index 6011526cae..44da31c1c2 100644
--- a/.mailmap
+++ b/.mailmap
@@ -232,6 +232,7 @@ Chintu Hetam 
 Choonho Son 
 Chris Metcalf 
 Christian Ehrhardt 
+Christian Koue Muf 
 Christian Maciocco 
 Christophe Fontaine 
 Christophe Grosse 
@@ -986,6 +987,7 @@ Mukesh Dua 
 Murphy Yang 
 Murthy NSSR 
 Muthurajan Jayakumar 
+Mykola Kostenok 
 Nachiketa Prachanda  
 Nagadheeraj Rottela 
 Naga Harish K S V 
@@ -1040,6 +1042,7 @@ Odi Assli 
 Ognjen Joldzic 
 Ola Liljedahl 
 Oleg Polyakov 
+Oleksandr Kolomeiets 
 Olga Shern  
 Olivier Gournet 
 Olivier Matz 
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f704319771..8249e4eb92 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -852,6 +852,10 @@ static void cmd_help_long_parsed(void *parsed_result,
"flow destroy {port_id} rule {rule_id} [...]\n"
"Destroy specific flow rules.\n\n"
 
+   "flow actions_update {port_id} {rule_id}"
+   " actions {action} [/ {action} [...]] / end [user_id]\n"
+   "Update a flow rule with new actions.\n\n"
+
"flow flush {port_id}\n"
"Destroy all flow rules.\n\n"
 
diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 359c187b3c..0af8d13121 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -91,6 +91,7 @@ enum index {
VALIDATE,
CREATE,
DESTROY,
+   ACTIONS_UPDATE,
FLUSH,
DUMP,
QUERY,
@@ -250,6 +251,7 @@ enum index {
VC_TUNNEL_SET,
VC_TUNNEL_MATCH,
VC_USER_ID,
+   VC_IS_USER_ID,
 
/* Dump arguments */
DUMP_ALL,
@@ -3084,6 +3086,7 @@ static const struct token token_list[] = {
  VALIDATE,
  CREATE,
  DESTROY,
+ ACTIONS_UPDATE,
  FLUSH,
  DUMP,
  LIST,
@@ -3888,6 +3891,17 @@ static const struct token token_list[] = {
.args = ARGS(ARGS_ENTRY(struct buffer, port)),
.call = parse_destroy,
},
+   [ACTIONS_UPDATE] = {
+   .name = "actions_update",
+   .help = "update a flow rule with new actions",
+   .next = NEXT(NEXT_ENTRY(VC_IS_USER_ID, END),
+NEXT_ENTRY(ACTIONS),
+NEXT_ENTRY(COMMON_RULE_ID),
+NEXT_ENTRY(COMMON_PORT_ID)),
+   .args = ARGS(ARGS_ENTRY(struct buffer, args.vc.rule_id),
+ARGS_ENTRY(struct buffer, port)),
+   .call = parse_vc,
+   },
[FLUSH] = {
.name = "flush",
.help = "destroy all flow rules",
@@ -4134,6 +4148,11 @@ static const struct token token_list[] = {
.args = ARGS(ARGS_ENTRY(struct buffer, args.vc.user_id)),
.call = parse_vc,
},
+   [VC_IS_USER_ID] = {
+   .name = "user_id",
+   .help = "rule identifier is user-id",
+   .call = parse_vc,
+   },
/* Validate/create pattern. */
[ITEM_PATTERN] = {
.name = "pattern",
@@ -8083,8 +8102,13 @@ parse_vc(struct context *ctx, const struct token *token,
if (!out->command) {
if (ctx->curr != VALIDATE && ctx->curr != CREATE &&
ctx->curr != PATTERN_TEMPLATE_CREATE &&
-   ctx->curr != ACTIONS_TEMPLATE_CREATE)
+   ctx->curr != ACTIONS_TEMPLATE_CREATE &&
+   ctx->curr != ACTIONS_UPDATE)
return -1;
+   if (ctx->curr == ACTIONS_UPDATE)
+   out->args.vc.pat

Re: [PATCH v4 1/1] ethdev: parsing multiple representor devargs string

2024-01-26 Thread Ferruh Yigit
On 1/21/2024 7:19 PM, Harman Kalra wrote:
> Adding support for parsing multiple representor devargs strings
> passed to a PCI BDF. There may be scenario where port representors
> for various PFs or VFs under PFs are required and all these are
> representor ports shall be backed by single pci device. In such
> case port representors can be created using devargs string:
> ,representor=[pf[0-1],pf2vf[1,2-3],[4-5]]
> 

This patch is to be able to parse multiple representor device argument,
but I am concerned how it is used.

When there are multiple representor ports backed up by same device,
can't it cause syncronization issues, like if two representor interfaces
used for conflicting configurations. Or if datapath will be supported,
what if two representator used simultaneously.



Meanwhile please check some comments below related to the parser code.

<...>

> @@ -459,9 +460,26 @@ eth_dev_devargs_tokenise(struct rte_kvargs *arglist, 
> const char *str_in)
>   break;
>  
>   case 3: /* Parsing list */
> - if (*letter == ']')
> - state = 2;
> - else if (*letter == '\0')
> + if (*letter == ']') {
> + /* For devargs having singles lists move to 
> state 2 once letter
> +  * becomes ']' so each can be considered as 
> different pair key
> +  * value. But in nested lists case e.g. 
> multiple representors
> +  * case i.e. [pf[0-3],pfvf[3,4-6]], complete 
> nested list should
> +  * be considered as one pair value, hence 
> checking if end of outer
> +  * list ']' is reached else stay on state 3.
> +  */
> + if ((strcmp("representor", pair->key) == 0) 
> &&
> + (*(letter + 1) != '\0' && *(letter + 2) != 
> '\0' &&
> +  *(letter + 3) != '\0') 
> &&
> + ((*(letter + 2) == 'p' && *(letter + 3) == 
> 'f') ||
> +  (*(letter + 2) == 'v' && *(letter + 3) == 
> 'f') ||
> +  (*(letter + 2) == 's' && *(letter + 3) == 
> 'f') ||
> +  (*(letter + 2) == 'c' && isdigit(*(letter 
> + 3))) ||
> +  (*(letter + 2) == '[' && isdigit(*(letter 
> + 3)
>

Above is hard to understand but there are some assumptions in the input,
can we list supported syntax in the comment to make it more clear.

For example following seems not support, can you please check if
intentional:
[vf0,vf1] // I am aware this can be put as vf[0,1] too
[vf[0,1],3]
[vf[0],vf[1]]

I am not saying above should be supported, but syntax should be clear
what is supported what is not.


Also I can't check but is the redundant input verified, like:
[vf[0-3],vf[3,4]]




Re: Testing scatter support for PMDs using testpmd

2024-01-26 Thread Boyer, Andrew


On Jan 24, 2024, at 12:16 PM, Jeremy Spewock  wrote:

Hello maintainers,

In porting over the first ethdev suite to the new DTS framework, there was an 
inconsistency that we found and we were wondering if anyone would be able to 
shed some light on this. In general the inconsistency pertains to Intel and 
Mellanox NICs, where one will throw an error and not start testpmd while the 
other will work as expected.

In the original DTS suite for testing scattered packets, testpmd is started 
with the flags --max-packet-len=9000 and --mbuf-size=2048. This starts and 
works fine on Intel NICs, but when you use the same flags on a Mellanox NIC, it 
will produce the error seen below. There is a flag that exists for testpmd 
called --enable-scatter and when this flag is provided, the Mellanox NIC seems 
to accept the flags and start without error.

Our assumption is that this should be consistent between NICs, is there a 
reason that one NIC would allow for starting testpmd without explicitly 
enabling scatter and the other wouldn't? Should this flag always be present, 
and is it an error that testpmd can start without it in the first place?

Here is the error provided when attempting to run on a Mellanox NIC:

mlx5_net: port 0 Rx queue 0: Scatter offload is not configured and no enough 
mbuf space(2048) to contain the maximum RX packet length(9000) with 
head-room(128)
mlx5_net: port 0 unable to allocate rx queue index 0
Fail to configure port 0 rx queues
Start ports failed

Thank you for any insight,
Jeremy

Hello Jeremy,

I can share a little bit of what I've seen while working on our devices.

The client can specify the max packet size, MTU, mbuf size, and whether to 
enable Rx or Tx s/g (separately). For performance reasons we don't want to 
enable s/g if it's not needed.

Now, the client can easily set things up with a small MTU, start processing, 
stop the port, and increase the MTU - beyond what a single mbuf would hold. To 
avoid having to tear down and rebuild the queues on an MTU change to enable s/g 
support, we automatically enable Rx s/g if the client presents mbufs which are 
too small to hold the max MTU.

Unfortunately the API to configure the Tx queues doesn't tell us anything about 
the mbuf size, and there's nothing stopping the client from configuring Tx 
before Rx. So we can't reliably auto-enable Tx s/g, and it is possible to get 
into a config where the Rx side produces chained mbufs which the Tx side can't 
handle.

To avoid this misconfig we have some versions of our PMD set to fail to start 
if Rx s/g is enabled but Tx s/g isn't.

Hope this helps,
Andrew



AGX100 and ACC100 series for 24.03

2024-01-26 Thread Vargas, Hernan
Hi Maxime,

Kind reminder to review these 2 series for 24.03:


  1.  
https://patches.dpdk.org/project/dpdk/patch/20240123165454.104465-2-hernan.var...@intel.com/
  2.  
https://patches.dpdk.org/project/dpdk/patch/20240112203618.27094-2-hernan.var...@intel.com/

Thanks,
Hernan


Re: [24.03 RFC] argparse: add argparse library

2024-01-26 Thread Stephen Hemminger
On Thu, 25 Jan 2024 14:31:03 +0800
fengchengwen  wrote:

> Hi Stephen,
> 
> On 2024/1/24 23:54, Stephen Hemminger wrote:
> > On Tue, 21 Nov 2023 12:26:51 +
> > Chengwen Feng  wrote:
> >   
> >> Introduce argparse library (which was inspired by the thread [1]),
> >> compared with getopt, the argparse has following advantages:
> >> 1) Set the help information when defining parameters.
> >> 2) Support positional parameters.
> >>
> >> The parameters parsing according following:
> >> 1) positional: use callback to parse (passed the long-name as the key
> >>for callback).
> >> 2) optional:
> >>In addition to callback to parse, but also support:
> >> 2.1) no-val: support set default value to saver.
> >> 2.2) has-val: support set value to saver, the value must be conform
> >>   RTE_ARGPARSE_ARG_VAL_xxx.
> >> 2.3) opt-val: if current without value then treat as no-val, else could
> >>   treat as has-val.  
> > 
> > How compatiable is this with Python or other implementations of argparse
> > in C?  
> 
> This library is a subset of Python argparse, and it don't support some 
> advanced
> features, such as: subcommand, exclusive group. We could extend it to support
> if needed.
> 
> As for other implementation of argparse in C, I found a interesting site [1],
> this library supports all features except the compact/order/wchar.
> 
> [1] 
> https://attractivechaos.wordpress.com/2018/08/31/a-survey-of-argument-parsing-libraries-in-c-c/


I was looking at https://github.com/cofyc/argparse?tab=readme-ov-file

One good thing there is that argparse options can be defined in array.
Like:
struct argparse_option options[] = {
OPT_HELP(),
OPT_GROUP("Basic options"),
OPT_BOOLEAN('f', "force", &force, "force to do", NULL, 0, 0),
OPT_BOOLEAN('t', "test", &test, "test only", NULL, 0, 0),
OPT_STRING('p', "path", &path, "path to read", NULL, 0, 0),
OPT_INTEGER('i', "int", &int_num, "selected integer", NULL, 0, 0),
OPT_FLOAT('s', "float", &flt_num, "selected float", NULL, 0, 0),
OPT_GROUP("Bits options"),
OPT_BIT(0, "read", &perms, "read perm", NULL, PERM_READ, OPT_NONEG),
OPT_BIT(0, "write", &perms, "write perm", NULL, PERM_WRITE, 0),
OPT_BIT(0, "exec", &perms, "exec perm", NULL, PERM_EXEC, 0),
OPT_END(),
};


Re: [PATCH v3 0/2] ethdev: add the check for PTP capability

2024-01-26 Thread Ferruh Yigit
On 1/11/2024 6:25 AM, lihuisong (C) wrote:
> Hi Ferruh,
> 
> 在 2023/11/23 19:56, lihuisong (C) 写道:
>>
>> 在 2023/11/2 7:39, Ferruh Yigit 写道:
>>> timesync_read_rx_timestamp
>>> On 9/21/2023 12:59 PM, lihuisong (C) wrote:
 add ice & igc maintainers

 在 2023/9/21 19:06, Ferruh Yigit 写道:
> On 9/21/2023 11:02 AM, lihuisong (C) wrote:
>> Hi Ferruh,
>>
>> Sorry for my delay reply because of taking a look at all PMDs
>> implementation.
>>
>>
>> 在 2023/9/16 1:46, Ferruh Yigit 写道:
>>> On 8/17/2023 9:42 AM, Huisong Li wrote:
    From the first version of ptpclient, it seems that this example
 assume that
 the PMDs support the PTP feature and enable PTP by default.
 Please see
 commit ab129e9065a5 ("examples/ptpclient: add minimal PTP client")
 which are introduced in 2015.

 And two years later, Rx HW timestamp offload was introduced to
 enable or
 disable PTP feature in HW via rte_eth_rxmode. Please see
 commit 42ffc45aa340 ("ethdev: add Rx HW timestamp capability").

>>> Hi Huisong,
>>>
>>> As far as I know this offload is not for PTP.
>>> PTP and TIMESTAMP are different.
>> If TIMESTAMP offload cannot stand for PTP, we may need to add one new
>> offlaod for PTP.
>>
> Can you please detail what is "PTP offload"?
>
 It indicates whether the device supports PTP or enable  PTP feature.

>>> We have 'rte_eth_timesync_enable()' and 'rte_eth_timesync_disable()'
>>> APIs to control PTP support.
>> No, this is just to control it.
>> we still need to like a device capablity to report application if the
>> port support to call this API, right?
>>>
>>> But when mention from "offload", it is something device itself does.
>>>
>>> PTP is a protocol (IEEE 1588), and used to synchronize clocks.
>>> What I get is protocol can be parsed by networking stack and it can be
>>> used by application to synchronize clock.
>>>
>>> When you are refer to "PTP offload", does it mean device (NIC)
>>> understands the protocol and parse it to synchronize device clock with
>>> other devices?
>> Good point. PTP offload is unreasonable.
>> But the capablity is required indeed.
>> What do you think of introducing a RTE_ETH_DEV_PTP in
>> dev->data->dev_flags for PTP feature?
> 
> Can you take a look at this discussion line again?
> 
> Let's introduce a  RTE_ETH_DEV_PTP in dev->data->dev_flags to reveal if
> the device support PTP feature.
> 

Hi Huisong,

First let me try to summarize the discussion since it has been some time.

HW timer block can be used for Rx timestamp offload
(RTE_ETH_RX_OFFLOAD_TIMESTAMP) and/or PTP support, but they are two
different things.

This patch uses 'RTE_ETH_RX_OFFLOAD_TIMESTAMP' capability for PTP
support, which is wrong.



After we agreed on above, your next question is to use 'dev_flag' to
report PTP capability.

First, can you please describe what is the motivation to learn if HW
supports PTP or now, what is the benefit of knowing this.

If we agree that there is a need to know the PTP capability, question is
where to report this capability.

Suggested 'dev_flags' is used for various things, some are internal
flags and some are status, I don't think overloading this variable is
not good idea.

Other option is an update 'rte_eth_dev_info_get()' for it but it has the
same problem, this function is already overloaded with many different
things.

We can have an API just to get PTP capability, but this will require a
new dev_ops, this can be an option but my concern is having a capability
dev_ops for each feature create a mess in dev_ops.

Perhaps we can have single get_capability() API, and it gets features as
flags to return if that feature is supported or not, but this requires a
wider discussion.

Instead can we deduce the capability from PTP relevant dev_ops, if they
are implemented we can say it is supported? This doesn't require new
support.




RE: rte_atomic_*_explicit

2024-01-26 Thread Honnappa Nagarahalli


> >
> >>
> >> On Thu, Jan 25, 2024 at 11:10:47PM +0100, Morten Br�rup wrote:
>  From: Mattias R�nnblom [mailto:hof...@lysator.liu.se]
>  Sent: Thursday, 25 January 2024 19.54
> 
>  Why do rte_stdatomic.h functions have the suffix "_explicit"?
>  Especially
>  since there aren't any wrappers for the implicit variants.
> 
>  More to type, more to read.
> >>>
> >>> They have the "_explicit" suffix to make their names similar to
> >>> those in
> >> stdatomic.h.
> >>>
> >>> You might consider their existence somewhat temporary until C11
> >>> stdatomics
> >> can be fully phased in, so there's another argument for similar
> >> names. (This probably does not happen as long as compilers generate
> >> slower code for C11 stdatomics than with their atomic built-ins.)
> >>
> >> yes, there was feedback at the time it was.
> >>
> >> * we should *not* have non-explicit versions of the macros
> >> * the atomic generic functions should be named to match C11 standard
> >>with a rte_ prefix.
> > This was mainly done to ensure that users think through the memory
> ordering they want to use. This also matches with the compiler atomic built-
> ins. Without explicit, it is sequentially consistent memory order.
> >
> 
> "This" is referring to the first bullet only, correct?
> 
> You don't have to distinguish between implicit and explicit if you only have
> explicit.
Agree on your thought.

The '_explicit' was added to be aligned with the standard atomic API naming. 
The thought was - if we are aligned on the names, it needs less explanation for 
users.

> 
> >>
> >>>
> 
>  When was this API introduced? Shouldn't it say "experimental"
>  somewhere?
> >>>
> >>> They were introduced as part of the migration to C11.
> >>> I suppose they were not marked experimental because they replaced
> >> something we didn't want anymore (the compiler built-ins for atomics, e.g.
> >> __atomic_load_n()). I don't recall if we discussed experimental marking or
> not.
> >>
> >> i don't think we discussed it since they're wrapper macros.
> >>
> >>>
> >>>
> >>> Reverse paper trail:
> >>> https://git.dpdk.org/dpdk/log/lib/eal/include/rte_stdatomic.h
> >>> https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git
> >>> -
> >> send-email-roret...@linux.microsoft.com/
> >>> https://patchwork.dpdk.org/project/dpdk/patch/1692738045-32363-2-git
> >>> -
> >> send-email-roret...@linux.microsoft.com/
> >>>


[PATCH v4 1/7] net/gve: fully expose RSS offload support in dev_info

2024-01-26 Thread Joshua Washington
This patch communicates that the GVE driver supports RSS, along with
the RSS offloads supported by the driver.

Signed-off-by: Joshua Washington 
Reviewed-by: Rushil Gupta 
Reviewed-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.c | 4 +++-
 drivers/net/gve/gve_ethdev.h | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index d162fd3864..6acdb4e13b 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -405,7 +405,7 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->max_mtu = priv->max_mtu;
dev_info->min_mtu = RTE_ETHER_MIN_MTU;
 
-   dev_info->rx_offload_capa = 0;
+   dev_info->rx_offload_capa = RTE_ETH_RX_OFFLOAD_RSS_HASH;
dev_info->tx_offload_capa =
RTE_ETH_TX_OFFLOAD_MULTI_SEGS   |
RTE_ETH_TX_OFFLOAD_UDP_CKSUM|
@@ -442,6 +442,8 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.nb_align = 1,
};
 
+   dev_info->flow_type_rss_offloads = GVE_RSS_OFFLOAD_ALL;
+
return 0;
 }
 
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index 9893fcfee6..14c72ec91a 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -33,6 +33,14 @@
RTE_MBUF_F_TX_L4_MASK  |\
RTE_MBUF_F_TX_TCP_SEG)
 
+#define GVE_RSS_OFFLOAD_ALL (  \
+   RTE_ETH_RSS_IPV4 |  \
+   RTE_ETH_RSS_NONFRAG_IPV4_TCP |  \
+   RTE_ETH_RSS_IPV6 |  \
+   RTE_ETH_RSS_IPV6_EX |   \
+   RTE_ETH_RSS_NONFRAG_IPV6_TCP |  \
+   RTE_ETH_RSS_IPV6_TCP_EX)
+
 /* A list of pages registered with the device during setup and used by a queue
  * as buffers
  */
-- 
2.43.0.429.g432eaa2c6b-goog



[PATCH v4 2/7] net/gve: RSS adminq command changes

2024-01-26 Thread Joshua Washington
This change introduces admin queue changes that enable the configuration
of RSS parameters for the GVE driver.

Signed-off-by: Joshua Washington 
Reviewed-by: Rushil Gupta 
Reviewed-by: Jeroen de Borst 
---
 drivers/net/gve/base/gve.h| 15 
 drivers/net/gve/base/gve_adminq.c | 58 +++
 drivers/net/gve/base/gve_adminq.h | 29 
 drivers/net/gve/gve_ethdev.h  |  6 +++-
 4 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/drivers/net/gve/base/gve.h b/drivers/net/gve/base/gve.h
index f7b297e759..9c58fc4238 100644
--- a/drivers/net/gve/base/gve.h
+++ b/drivers/net/gve/base/gve.h
@@ -51,4 +51,19 @@ enum gve_state_flags_bit {
GVE_PRIV_FLAGS_NAPI_ENABLED = 4,
 };
 
+enum gve_rss_hash_algorithm {
+   GVE_RSS_HASH_UNDEFINED = 0,
+   GVE_RSS_HASH_TOEPLITZ = 1,
+};
+
+struct gve_rss_config {
+   uint16_t hash_types;
+   enum gve_rss_hash_algorithm alg;
+   uint16_t key_size;
+   uint16_t indir_size;
+   uint8_t *key;
+   uint32_t *indir;
+};
+
+
 #endif /* _GVE_H_ */
diff --git a/drivers/net/gve/base/gve_adminq.c 
b/drivers/net/gve/base/gve_adminq.c
index 343bd13d67..629d15cfbe 100644
--- a/drivers/net/gve/base/gve_adminq.c
+++ b/drivers/net/gve/base/gve_adminq.c
@@ -389,6 +389,9 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
case GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES:
priv->adminq_dcfg_device_resources_cnt++;
break;
+   case GVE_ADMINQ_CONFIGURE_RSS:
+   priv->adminq_cfg_rss_cnt++;
+   break;
case GVE_ADMINQ_SET_DRIVER_PARAMETER:
priv->adminq_set_driver_parameter_cnt++;
break;
@@ -938,3 +941,58 @@ int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
gve_free_dma_mem(&ptype_map_dma_mem);
return err;
 }
+
+int gve_adminq_configure_rss(struct gve_priv *priv,
+struct gve_rss_config *rss_config)
+{
+   struct gve_dma_mem indirection_table_dma_mem;
+   struct gve_dma_mem rss_key_dma_mem;
+   union gve_adminq_command cmd;
+   __be32 *indir = NULL;
+   u8 *key = NULL;
+   int err = 0;
+   int i;
+
+   if (!rss_config->indir_size || !rss_config->key_size)
+   return -EINVAL;
+
+   indir = gve_alloc_dma_mem(&indirection_table_dma_mem,
+ rss_config->indir_size *
+   sizeof(*rss_config->indir));
+   if (!indir) {
+   err = -ENOMEM;
+   goto out;
+   }
+   for (i = 0; i < rss_config->indir_size; i++)
+   indir[i] = cpu_to_be32(rss_config->indir[i]);
+
+   key = gve_alloc_dma_mem(&rss_key_dma_mem,
+   rss_config->key_size *
+   sizeof(*rss_config->key));
+   if (!key) {
+   err = -ENOMEM;
+   goto out;
+   }
+   memcpy(key, rss_config->key, rss_config->key_size);
+
+   memset(&cmd, 0, sizeof(cmd));
+   cmd.opcode = cpu_to_be32(GVE_ADMINQ_CONFIGURE_RSS);
+   cmd.configure_rss = (struct gve_adminq_configure_rss) {
+   .hash_types = cpu_to_be16(rss_config->hash_types),
+   .halg = rss_config->alg,
+   .hkey_len = cpu_to_be16(rss_config->key_size),
+   .indir_len = cpu_to_be16(rss_config->indir_size),
+   .hkey_addr = cpu_to_be64(rss_key_dma_mem.pa),
+   .indir_addr = cpu_to_be64(indirection_table_dma_mem.pa),
+   };
+
+   err = gve_adminq_execute_cmd(priv, &cmd);
+
+out:
+   if (indir)
+   gve_free_dma_mem(&indirection_table_dma_mem);
+   if (key)
+   gve_free_dma_mem(&rss_key_dma_mem);
+   return err;
+}
+
diff --git a/drivers/net/gve/base/gve_adminq.h 
b/drivers/net/gve/base/gve_adminq.h
index f05362f85f..95f4960561 100644
--- a/drivers/net/gve/base/gve_adminq.h
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -19,6 +19,7 @@ enum gve_adminq_opcodes {
GVE_ADMINQ_DESTROY_TX_QUEUE = 0x7,
GVE_ADMINQ_DESTROY_RX_QUEUE = 0x8,
GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES = 0x9,
+   GVE_ADMINQ_CONFIGURE_RSS= 0xA,
GVE_ADMINQ_SET_DRIVER_PARAMETER = 0xB,
GVE_ADMINQ_REPORT_STATS = 0xC,
GVE_ADMINQ_REPORT_LINK_SPEED= 0xD,
@@ -377,6 +378,27 @@ struct gve_adminq_get_ptype_map {
__be64 ptype_map_addr;
 };
 
+#define GVE_RSS_HASH_IPV4  BIT(0)
+#define GVE_RSS_HASH_TCPV4 BIT(1)
+#define GVE_RSS_HASH_IPV6  BIT(2)
+#define GVE_RSS_HASH_IPV6_EX   BIT(3)
+#define GVE_RSS_HASH_TCPV6 BIT(4)
+#define GVE_RSS_HASH_TCPV6_EX  BIT(5)
+#define GVE_RSS_HASH_UDPV4 BIT(6)
+#define GVE_RSS_HASH_UDPV6 BIT(7)
+#define GVE_RSS_HASH_UDPV6_EX  BIT(8)
+
+/* RSS configuration comman

[PATCH v4 3/7] net/gve: add gve_rss library for handling RSS-related behaviors

2024-01-26 Thread Joshua Washington
This change includes a number of helper functions to facilitate RSS
configuration on the GVE DPDK driver. These methods are declared in
gve_rss.h.

Signed-off-by: Joshua Washington 
Reviewed-by: Rushil Gupta 
Reviewed-by: Jeroen de Borst 
---
 drivers/net/gve/base/gve_adminq.h |  10 +-
 drivers/net/gve/gve_ethdev.c  |   2 +-
 drivers/net/gve/gve_ethdev.h  |   4 +-
 drivers/net/gve/gve_rss.c | 206 ++
 drivers/net/gve/gve_rss.h | 107 
 drivers/net/gve/meson.build   |   1 +
 6 files changed, 319 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/gve/gve_rss.c
 create mode 100644 drivers/net/gve/gve_rss.h

diff --git a/drivers/net/gve/base/gve_adminq.h 
b/drivers/net/gve/base/gve_adminq.h
index 95f4960561..24abd945cc 100644
--- a/drivers/net/gve/base/gve_adminq.h
+++ b/drivers/net/gve/base/gve_adminq.h
@@ -378,15 +378,6 @@ struct gve_adminq_get_ptype_map {
__be64 ptype_map_addr;
 };
 
-#define GVE_RSS_HASH_IPV4  BIT(0)
-#define GVE_RSS_HASH_TCPV4 BIT(1)
-#define GVE_RSS_HASH_IPV6  BIT(2)
-#define GVE_RSS_HASH_IPV6_EX   BIT(3)
-#define GVE_RSS_HASH_TCPV6 BIT(4)
-#define GVE_RSS_HASH_TCPV6_EX  BIT(5)
-#define GVE_RSS_HASH_UDPV4 BIT(6)
-#define GVE_RSS_HASH_UDPV6 BIT(7)
-#define GVE_RSS_HASH_UDPV6_EX  BIT(8)
 
 /* RSS configuration command */
 struct gve_adminq_configure_rss {
@@ -428,6 +419,7 @@ union gve_adminq_command {
 GVE_CHECK_UNION_LEN(64, gve_adminq_command);
 
 struct gve_priv;
+struct gve_rss_config;
 struct gve_queue_page_list;
 int gve_adminq_alloc(struct gve_priv *priv);
 void gve_adminq_free(struct gve_priv *priv);
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 6acdb4e13b..936ca22cb9 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -442,7 +442,7 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.nb_align = 1,
};
 
-   dev_info->flow_type_rss_offloads = GVE_RSS_OFFLOAD_ALL;
+   dev_info->flow_type_rss_offloads = GVE_RTE_RSS_OFFLOAD_ALL;
 
return 0;
 }
diff --git a/drivers/net/gve/gve_ethdev.h b/drivers/net/gve/gve_ethdev.h
index f7635e829c..bc486cb941 100644
--- a/drivers/net/gve/gve_ethdev.h
+++ b/drivers/net/gve/gve_ethdev.h
@@ -33,7 +33,7 @@
RTE_MBUF_F_TX_L4_MASK  |\
RTE_MBUF_F_TX_TCP_SEG)
 
-#define GVE_RSS_OFFLOAD_ALL (  \
+#define GVE_RTE_RSS_OFFLOAD_ALL (  \
RTE_ETH_RSS_IPV4 |  \
RTE_ETH_RSS_NONFRAG_IPV4_TCP |  \
RTE_ETH_RSS_IPV6 |  \
@@ -290,6 +290,8 @@ struct gve_priv {
const struct rte_memzone *stats_report_mem;
uint16_t stats_start_idx; /* start index of array of stats written by 
NIC */
uint16_t stats_end_idx; /* end index of array of stats written by NIC */
+
+   struct gve_rss_config rss_config;
 };
 
 static inline bool
diff --git a/drivers/net/gve/gve_rss.c b/drivers/net/gve/gve_rss.c
new file mode 100644
index 00..931180f8f2
--- /dev/null
+++ b/drivers/net/gve/gve_rss.c
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Google LLC
+ */
+
+#include "gve_rss.h"
+
+int
+gve_generate_rss_reta(struct rte_eth_dev *dev, struct gve_rss_config *config)
+{
+   int i;
+   if (!config || !config->indir)
+   return -EINVAL;
+
+   for (i = 0; i < config->indir_size; i++)
+   config->indir[i] = i % dev->data->nb_rx_queues;
+
+   return 0;
+}
+
+
+int
+gve_init_rss_config(struct gve_rss_config *gve_rss_conf,
+   uint16_t key_size, uint16_t indir_size)
+{
+   int err;
+
+   gve_rss_conf->alg = GVE_RSS_HASH_TOEPLITZ;
+
+   gve_rss_conf->key_size = key_size;
+   gve_rss_conf->key = rte_zmalloc("rss key",
+   key_size * sizeof(*gve_rss_conf->key),
+   RTE_CACHE_LINE_SIZE);
+   if (!gve_rss_conf->key)
+   return -ENOMEM;
+
+   gve_rss_conf->indir_size = indir_size;
+   gve_rss_conf->indir = rte_zmalloc("rss reta",
+   indir_size * sizeof(*gve_rss_conf->indir),
+   RTE_CACHE_LINE_SIZE);
+   if (!gve_rss_conf->indir) {
+   err = -ENOMEM;
+   goto err_with_key;
+   }
+
+   return 0;
+err_with_key:
+   rte_free(gve_rss_conf->key);
+   return err;
+}
+
+int
+gve_init_rss_config_from_priv(struct gve_priv *priv,
+   struct gve_rss_config *gve_rss_conf)
+{
+   int err = gve_init_rss_config(gve_rss_conf, priv->rss_config.key_size,
+   priv->rss_config.indir_size);
+   if (err)
+   return err;
+
+   gve_rss_conf->hash_types = priv->rss_config.hash_types;
+   gve_rss_conf->alg = priv->rss_config.alg;
+   memcpy(gve_rss_conf->key, priv->rss_config.key,
+   gve_rss_conf->key_size * sizeof(*gve_rs

[PATCH v4 4/7] net/gve: RSS configuration update support

2024-01-26 Thread Joshua Washington
This patch adds support for updating the RSS hash key and hash fields
in the GVE PMD through the implementation of rss_hash_update and
rss_hash_conf_get.

The RSS hash key for gVNIC is required to be 40 bytes. On initial
configuration of the RSS hash key, the RSS redirection table will be
set to a static default, using a round-robin approach for all queues.
Note, however, that this patch does not include support for setting the
redirection table explicitly. In dev_configure, if the static
redirection table has been set, it will be updated to reflect the new
queue count, if it has changed.

The RSS key must be set before any other RSS configuration can happen.
As such, an attempt to set the hash types before the key is configured
will fail.

Signed-off-by: Joshua Washington 
Reviewed-by: Rushil Gupta 
Reviewed-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.c | 132 ++-
 drivers/net/gve/gve_ethdev.h |   3 +
 2 files changed, 133 insertions(+), 2 deletions(-)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 936ca22cb9..2a68d31808 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(C) 2022 Intel Corporation
+ * Copyright(C) 2022-2023 Intel Corporation
+ * Copyright(C) 2023 Google LLC
  */
 
 #include "gve_ethdev.h"
@@ -8,6 +9,7 @@
 #include "base/gve_osdep.h"
 #include "gve_version.h"
 #include "rte_ether.h"
+#include "gve_rss.h"
 
 static void
 gve_write_version(uint8_t *driver_version_register)
@@ -88,12 +90,31 @@ gve_dev_configure(struct rte_eth_dev *dev)
 {
struct gve_priv *priv = dev->data->dev_private;
 
-   if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
+   if (dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) {
dev->data->dev_conf.rxmode.offloads |= 
RTE_ETH_RX_OFFLOAD_RSS_HASH;
+   priv->rss_config.alg = GVE_RSS_HASH_TOEPLITZ;
+   }
 
if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO)
priv->enable_rsc = 1;
 
+   /* Reset RSS RETA in case number of queues changed. */
+   if (priv->rss_config.indir) {
+   struct gve_rss_config update_reta_config;
+   gve_init_rss_config_from_priv(priv, &update_reta_config);
+   gve_generate_rss_reta(dev, &update_reta_config);
+
+   int err = gve_adminq_configure_rss(priv, &update_reta_config);
+   if (err)
+   PMD_DRV_LOG(ERR,
+   "Could not reconfigure RSS redirection table.");
+   else
+   gve_update_priv_rss_config(priv, &update_reta_config);
+
+   gve_free_rss_config(&update_reta_config);
+   return err;
+   }
+
return 0;
 }
 
@@ -443,6 +464,8 @@ gve_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
};
 
dev_info->flow_type_rss_offloads = GVE_RTE_RSS_OFFLOAD_ALL;
+   dev_info->hash_key_size = GVE_RSS_HASH_KEY_SIZE;
+   dev_info->reta_size = GVE_RSS_INDIR_SIZE;
 
return 0;
 }
@@ -646,6 +669,107 @@ gve_xstats_get_names(struct rte_eth_dev *dev,
return count;
 }
 
+
+static int
+gve_rss_hash_update(struct rte_eth_dev *dev,
+   struct rte_eth_rss_conf *rss_conf)
+{
+   struct gve_priv *priv = dev->data->dev_private;
+   struct gve_rss_config gve_rss_conf;
+   int rss_reta_size;
+   int err;
+
+   if (gve_validate_rss_hf(rss_conf->rss_hf)) {
+   PMD_DRV_LOG(ERR, "Unsupported hash function.");
+   return -EINVAL;
+   }
+
+   if (rss_conf->algorithm != RTE_ETH_HASH_FUNCTION_TOEPLITZ &&
+   rss_conf->algorithm != RTE_ETH_HASH_FUNCTION_DEFAULT) {
+   PMD_DRV_LOG(ERR, "Device only supports Toeplitz algorithm.");
+   return -EINVAL;
+   }
+
+   if (rss_conf->rss_key_len) {
+   if (rss_conf->rss_key_len != GVE_RSS_HASH_KEY_SIZE) {
+   PMD_DRV_LOG(ERR,
+   "Invalid hash key size. Only RSS hash key size "
+   "of %u supported", GVE_RSS_HASH_KEY_SIZE);
+   return -EINVAL;
+   }
+
+   if (!rss_conf->rss_key) {
+   PMD_DRV_LOG(ERR, "RSS key must be non-null.");
+   return -EINVAL;
+   }
+   } else {
+   if (!priv->rss_config.key_size) {
+   PMD_DRV_LOG(ERR, "RSS key must be initialized before "
+   "any other configuration.");
+   return -EINVAL;
+   }
+   rss_conf->rss_key_len = priv->rss_config.key_size;
+   }
+
+   rss_reta_size = priv->rss_config.indir ?
+   priv->rss_config.indir_size :
+   GVE_RSS_INDIR_SIZE

[PATCH v4 5/7] net/gve: RSS redirection table update support

2024-01-26 Thread Joshua Washington
This patch introduces support for updating the RSS redirection table in
the GVE PMD through the implementation of rss_reta_update and
rss_reta_query.

Due to an infrastructure limitation, the RSS hash key must be manually
configured before the redirection table can be updated or queried. The
redirection table is expected to be exactly 128 bytes.

Signed-off-by: Joshua Washington 
Reviewed-by: Rushil Gupta 
Reviewed-by: Jeroen de Borst 
---
 drivers/net/gve/gve_ethdev.c | 95 
 1 file changed, 95 insertions(+)

diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index 2a68d31808..3b8ec5872d 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -770,6 +770,97 @@ gve_rss_hash_conf_get(struct rte_eth_dev *dev,
return 0;
 }
 
+static int
+gve_rss_reta_update(struct rte_eth_dev *dev,
+   struct rte_eth_rss_reta_entry64 *reta_conf, uint16_t reta_size)
+{
+   struct gve_priv *priv = dev->data->dev_private;
+   struct gve_rss_config gve_rss_conf;
+   int table_id;
+   int err;
+   int i;
+
+   /* RSS key must be set before the redirection table can be set. */
+   if (!priv->rss_config.key || priv->rss_config.key_size == 0) {
+   PMD_DRV_LOG(ERR, "RSS hash key msut be set before the "
+   "redirection table can be updated.");
+   return -ENOTSUP;
+   }
+
+   if (reta_size != GVE_RSS_INDIR_SIZE) {
+   PMD_DRV_LOG(ERR, "Redirection table must have %hu elements",
+   (uint16_t)GVE_RSS_INDIR_SIZE);
+   return -EINVAL;
+   }
+
+   err = gve_init_rss_config_from_priv(priv, &gve_rss_conf);
+   if (err) {
+   PMD_DRV_LOG(ERR, "Error allocating new RSS config.");
+   return err;
+   }
+
+   table_id = 0;
+   for (i = 0; i < priv->rss_config.indir_size; i++) {
+   int table_entry = i % RTE_ETH_RETA_GROUP_SIZE;
+   if (reta_conf[table_id].mask & (1ULL << table_entry))
+   gve_rss_conf.indir[i] =
+   reta_conf[table_id].reta[table_entry];
+
+   if (table_entry == RTE_ETH_RETA_GROUP_SIZE - 1)
+   table_id++;
+   }
+
+   err = gve_adminq_configure_rss(priv, &gve_rss_conf);
+   if (err)
+   PMD_DRV_LOG(ERR, "Problem configuring RSS with device.");
+   else
+   gve_update_priv_rss_config(priv, &gve_rss_conf);
+
+   gve_free_rss_config(&gve_rss_conf);
+   return err;
+}
+
+static int
+gve_rss_reta_query(struct rte_eth_dev *dev,
+   struct rte_eth_rss_reta_entry64 *reta_conf, uint16_t reta_size)
+{
+   struct gve_priv *priv = dev->data->dev_private;
+   int table_id;
+   int i;
+
+   if (!(dev->data->dev_conf.rxmode.offloads &
+   RTE_ETH_RX_OFFLOAD_RSS_HASH)) {
+   PMD_DRV_LOG(ERR, "RSS not configured.");
+   return -ENOTSUP;
+   }
+
+   /* RSS key must be set before the redirection table can be queried. */
+   if (!priv->rss_config.key) {
+   PMD_DRV_LOG(ERR, "RSS hash key must be set before the "
+   "redirection table can be initialized.");
+   return -ENOTSUP;
+   }
+
+   if (reta_size != priv->rss_config.indir_size) {
+   PMD_DRV_LOG(ERR, "RSS redirection table must have %d entries.",
+   priv->rss_config.indir_size);
+   return -EINVAL;
+   }
+
+   table_id = 0;
+   for (i = 0; i < priv->rss_config.indir_size; i++) {
+   int table_entry = i % RTE_ETH_RETA_GROUP_SIZE;
+   if (reta_conf[table_id].mask & (1ULL << table_entry))
+   reta_conf[table_id].reta[table_entry] =
+   priv->rss_config.indir[i];
+
+   if (table_entry == RTE_ETH_RETA_GROUP_SIZE - 1)
+   table_id++;
+   }
+
+   return 0;
+}
+
 static const struct eth_dev_ops gve_eth_dev_ops = {
.dev_configure= gve_dev_configure,
.dev_start= gve_dev_start,
@@ -792,6 +883,8 @@ static const struct eth_dev_ops gve_eth_dev_ops = {
.xstats_get_names = gve_xstats_get_names,
.rss_hash_update  = gve_rss_hash_update,
.rss_hash_conf_get= gve_rss_hash_conf_get,
+   .reta_update  = gve_rss_reta_update,
+   .reta_query   = gve_rss_reta_query,
 };
 
 static const struct eth_dev_ops gve_eth_dev_ops_dqo = {
@@ -816,6 +909,8 @@ static const struct eth_dev_ops gve_eth_dev_ops_dqo = {
.xstats_get_names = gve_xstats_get_names,
.rss_hash_update  = gve_rss_hash_update,
.rss_hash_conf_get= gve_rss_hash_conf_get,
+   .reta_update  = gve_rss_reta_update,
+   .reta_query   = gve_rss_reta_query,
 };
 
 static void
-- 
2.43.0.429.g432eaa2c6b-goog



[PATCH v4 6/7] net/gve: update gve.ini with RSS capabilities

2024-01-26 Thread Joshua Washington
This patch updates the DPDK feature matrix to expose that the GVE driver
supports RSS hash, RSS key update, and RSS reta update.

Signed-off-by: Joshua Washington 
---
 doc/guides/nics/features/gve.ini | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/nics/features/gve.ini b/doc/guides/nics/features/gve.ini
index 838edd456a..bc08648dbc 100644
--- a/doc/guides/nics/features/gve.ini
+++ b/doc/guides/nics/features/gve.ini
@@ -15,3 +15,6 @@ Linux= Y
 x86-32   = Y
 x86-64   = Y
 Usage doc= Y
+RSS hash = Y
+RSS key update   = Y
+RSS reta update  = Y
-- 
2.43.0.429.g432eaa2c6b-goog



[PATCH v4 7/7] net/gve: update GVE documentation with RSS support

2024-01-26 Thread Joshua Washington
This patch updates the GVE doc page to communicate that GVE now supports
RSS configuration and explains the limitations.

Signed-off-by: Joshua Washington 
---
 doc/guides/nics/gve.rst | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/doc/guides/nics/gve.rst b/doc/guides/nics/gve.rst
index 1c3eaf03ef..80991e70cf 100644
--- a/doc/guides/nics/gve.rst
+++ b/doc/guides/nics/gve.rst
@@ -70,6 +70,8 @@ Supported features of the GVE PMD are:
 - Link state information
 - Tx multi-segments (Scatter Tx)
 - Tx UDP/TCP/SCTP Checksum
+- RSS hash configuration
+- RSS redirection table update and query
 
 Currently, only GQI_QPL and GQI_RDA queue format are supported in PMD.
 Jumbo Frame is not supported in PMD for now.
@@ -77,10 +79,12 @@ It'll be added in a future DPDK release.
 Also, only GQI_QPL queue format is in use on GCP
 since GQI_RDA hasn't been released in production.
 
-Currently, setting MTU with value larger than 1460 is not supported.
+RSS
+^^^
 
-Currently, only "RSS hash" is force enabled
-so that the backend hardware device calculated hash values
-could be shared with applications.
-But for RSS, there is no such API to config RSS hash function or RETA table.
-So, limited RSS is supported only with default config/setting.
+GVE RSS can be enabled and configured using the standard interfaces. The driver
+does not support querying the initial RSS configuration.
+
+The RSS hash key must be exactly 40 bytes, and the redirection table must have
+128 entries. The RSS hash key must be configured before the redirection table
+can be set up.
-- 
2.43.0.429.g432eaa2c6b-goog



Re: rte_atomic_*_explicit

2024-01-26 Thread Tyler Retzlaff
On Fri, Jan 26, 2024 at 04:58:54PM +, Honnappa Nagarahalli wrote:
> 
> 
> > >
> > >>
> > >> On Thu, Jan 25, 2024 at 11:10:47PM +0100, Morten Br�rup wrote:
> >  From: Mattias R�nnblom [mailto:hof...@lysator.liu.se]
> >  Sent: Thursday, 25 January 2024 19.54
> > 
> >  Why do rte_stdatomic.h functions have the suffix "_explicit"?
> >  Especially
> >  since there aren't any wrappers for the implicit variants.
> > 
> >  More to type, more to read.
> > >>>
> > >>> They have the "_explicit" suffix to make their names similar to
> > >>> those in
> > >> stdatomic.h.
> > >>>
> > >>> You might consider their existence somewhat temporary until C11
> > >>> stdatomics
> > >> can be fully phased in, so there's another argument for similar
> > >> names. (This probably does not happen as long as compilers generate
> > >> slower code for C11 stdatomics than with their atomic built-ins.)
> > >>
> > >> yes, there was feedback at the time it was.
> > >>
> > >> * we should *not* have non-explicit versions of the macros
> > >> * the atomic generic functions should be named to match C11 standard
> > >>with a rte_ prefix.
> > > This was mainly done to ensure that users think through the memory
> > ordering they want to use. This also matches with the compiler atomic built-
> > ins. Without explicit, it is sequentially consistent memory order.
> > >
> > 
> > "This" is referring to the first bullet only, correct?
> > 
> > You don't have to distinguish between implicit and explicit if you only have
> > explicit.
> Agree on your thought.
> 
> The '_explicit' was added to be aligned with the standard atomic API naming. 
> The thought was - if we are aligned on the names, it needs less explanation 
> for users.

well another benefit is that in the future if/when a decision is made to
directly use the names from the standard it is most trivial to just
strip the rte_ prefix tree wide.


Re: rte_atomic_*_explicit

2024-01-26 Thread Tyler Retzlaff
On Fri, Jan 26, 2024 at 11:52:11AM +0100, Morten Brørup wrote:
> > From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> > Sent: Friday, 26 January 2024 09.07
> > 
> > On 2024-01-25 23:10, Morten Brørup wrote:
> > >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> > >> Sent: Thursday, 25 January 2024 19.54
> > >>
> > >> Why do rte_stdatomic.h functions have the suffix "_explicit"?
> > >> Especially
> > >> since there aren't any wrappers for the implicit variants.
> > >>
> > >> More to type, more to read.
> > >
> > > They have the "_explicit" suffix to make their names similar to those
> > in stdatomic.h.
> > >
> > 
> > OK, so to avoid a situation where someone accidentally misinterpret
> > rte_atomic_fetch_add(&counter, 1, rte_memory_order_relaxed);
> > as what, exactly?
> > 
> > If you have a wrapper, why not take the opportunity and reap some
> > benefits from this and fix/extend the standard API, making it a better
> > fit for your application. Like removing the redundant "_explicit",
> > "fetch"-less add/sub, maybe and a function for single-writer atomic add
> > (non-atomic read + non-atomic add + atomic store), etc.
> > 
> > Prohibiting implicit operations was done already, so it's already now
> > not a straight-off copy of the standard API.
> > 
> > > You might consider their existence somewhat temporary until C11
> > stdatomics can be fully phased in, so there's another argument for
> > similar names. (This probably does not happen as long as compilers
> > generate slower code for C11 stdatomics than with their atomic built-
> > ins.)
> > >
> > 
> > To me, it seems reasonable a wrapper API should stay as long as it
> > provide a benefit over the standard API. One could imagine that being a
> > long time.
> > 
> > I imagine some DPDK developers being tired of migrating from one
> > atomics
> > API to another.  -> GCC built-ins (-> attempted C11
> > stdatomics?) -> . Now you are adding a future "-> CXY
> > atomics" move as well, and with that it also seems natural to add a
> > change back to a wrapper or complementary API, when CXY didn't turned
> > out good enough for some particular platform, or when some non-
> > complaint
> > compiler comes along.
> 
> Yes, more migrations seem to be on the roadmap.
> 
> We can take the opportunity to change direction now, and decide to keep the 
>  API long term.
> Then it would need more documentation (basically copying function 
> descriptions from ), and the functions could have the 
> "_explicit" suffix removed (macros with the suffix could be added for 
> backwards compatibility), and more functions - like the ones you suggested 
> above - could be added.
> 
> What do people think?
> 1. Keep considering  a temporary wrapper for  
> until compilers reach some undefined level of maturity, or
> 2. Consider  stable, clean it up (remove "_explicit" 
> suffix), add documentation to the macros, and extend it.
> 
> Living in a DPDK-only environment, I would prefer option 2; but if mixing 
> DPDK code with non-DPDK code (that uses ) it might be weird.

rte_stdatomic.h should be considered temporary, but how long temporary
is depends on when we can deprecate support for distributions and the
older toolchains they are tied to.

the macros were introduced to allow a path to gradually moving to
standard c11 atomics. gcc versions available on distributions we
promise support for is currently the biggest barrier to direct
adoption.

* older versions of gcc generate sub-optimal code when using c11
  atomic generics in some instances.

* gcc c++23 support is incomplete, in particular at the time we
  evaluated using the c11 atomics directly but gcc c++23 held us
  back because the stdatomic.h it shipped didn't interoperate.

* the macros allow developers to easily evaluate/compare with and
  without standard C atomics with a single build-time option
  -Denable_stdatomic=true

* eventually we will want to move directly to the names from the
  standard, arguably just search and replace of rte_atomic with
  atomic is the most mechanically trivial - hence there is some
  value in keeping _explicit suffix.

> > 
> > I suggested fixing the original  API, or at least have a
> > wrapper API, already at the point DPDK moved to direct GCC built-in
> > calls. Then we wouldn't have had this atomics API ping-pong.
> 
> The decision back then might have been too hasty, and based on incomplete 
> assumptions.
> Please shout louder next time you think a mistake is in the making.
> 
> > 
> > >>
> > >> When was this API introduced? Shouldn't it say "experimental"
> > >> somewhere?
> > >
> > > They were introduced as part of the migration to C11.
> > > I suppose they were not marked experimental because they replaced
> > something we didn't want anymore (the compiler built-ins for atomics,
> > e.g. __atomic_load_n()). I don't recall if we discussed experimental
> > marking or not.
> 
> In hindsight, they should probably have been marked

Re: [PATCH v3 0/2] ethdev: add the check for PTP capability

2024-01-26 Thread lihuisong (C)



在 2024/1/27 0:54, Ferruh Yigit 写道:

On 1/11/2024 6:25 AM, lihuisong (C) wrote:

Hi Ferruh,

在 2023/11/23 19:56, lihuisong (C) 写道:

在 2023/11/2 7:39, Ferruh Yigit 写道:

timesync_read_rx_timestamp
On 9/21/2023 12:59 PM, lihuisong (C) wrote:

add ice & igc maintainers

在 2023/9/21 19:06, Ferruh Yigit 写道:

On 9/21/2023 11:02 AM, lihuisong (C) wrote:

Hi Ferruh,

Sorry for my delay reply because of taking a look at all PMDs
implementation.


在 2023/9/16 1:46, Ferruh Yigit 写道:

On 8/17/2023 9:42 AM, Huisong Li wrote:

    From the first version of ptpclient, it seems that this example
assume that
the PMDs support the PTP feature and enable PTP by default.
Please see
commit ab129e9065a5 ("examples/ptpclient: add minimal PTP client")
which are introduced in 2015.

And two years later, Rx HW timestamp offload was introduced to
enable or
disable PTP feature in HW via rte_eth_rxmode. Please see
commit 42ffc45aa340 ("ethdev: add Rx HW timestamp capability").


Hi Huisong,

As far as I know this offload is not for PTP.
PTP and TIMESTAMP are different.

If TIMESTAMP offload cannot stand for PTP, we may need to add one new
offlaod for PTP.


Can you please detail what is "PTP offload"?


It indicates whether the device supports PTP or enable  PTP feature.


We have 'rte_eth_timesync_enable()' and 'rte_eth_timesync_disable()'
APIs to control PTP support.

No, this is just to control it.
we still need to like a device capablity to report application if the
port support to call this API, right?

But when mention from "offload", it is something device itself does.

PTP is a protocol (IEEE 1588), and used to synchronize clocks.
What I get is protocol can be parsed by networking stack and it can be
used by application to synchronize clock.

When you are refer to "PTP offload", does it mean device (NIC)
understands the protocol and parse it to synchronize device clock with
other devices?

Good point. PTP offload is unreasonable.
But the capablity is required indeed.
What do you think of introducing a RTE_ETH_DEV_PTP in
dev->data->dev_flags for PTP feature?

Can you take a look at this discussion line again?

Let's introduce a  RTE_ETH_DEV_PTP in dev->data->dev_flags to reveal if
the device support PTP feature.


Hi Ferruh,

Thanks for taking your time to reply.


Hi Huisong,

First let me try to summarize the discussion since it has been some time.

HW timer block can be used for Rx timestamp offload
(RTE_ETH_RX_OFFLOAD_TIMESTAMP) and/or PTP support, but they are two
different things.

This patch uses 'RTE_ETH_RX_OFFLOAD_TIMESTAMP' capability for PTP
support, which is wrong.


correct.


After we agreed on above, your next question is to use 'dev_flag' to
report PTP capability.

First, can you please describe what is the motivation to learn if HW
supports PTP or now, what is the benefit of knowing this.

There are a couple of device which have the same driver on a platform or OS.
But just allow one device to support or be responsible for PTP feature.
The firmware will report a capability to tell the device if it is 
support PTP.
But, currently, driver doesn't know how to report user which device 
support PTP feature.


In addition, many drivers use RTE_LIBRTE_IEEE1588 to control PTP code flow.
Whether the device supports PTP is irrelevant to this macro.


If we agree that there is a need to know the PTP capability, question is
where to report this capability.

Suggested 'dev_flags' is used for various things, some are internal
flags and some are status, I don't think overloading this variable is
not good idea.

Yes, we need to consider  carefully.


Other option is an update 'rte_eth_dev_info_get()' for it but it has the
same problem, this function is already overloaded with many different
things.

We can have an API just to get PTP capability, but this will require a
new dev_ops, this can be an option but my concern is having a capability
dev_ops for each feature create a mess in dev_ops.

Perhaps we can have single get_capability() API, and it gets features as
flags to return if that feature is supported or not, but this requires a
wider discussion.

Instead can we deduce the capability from PTP relevant dev_ops, if they
are implemented we can say it is supported? This doesn't require new
support.

Thank you mentioning so many ways.
For the end of advice, I don't think it is appropriate.
Because we have to modify dynamically the pointer address of all PTP 
APIs in dev_ops on the above case.


How about we use rte_eth_dev_info.dev_capa to report PTP offload?
This is specifically used to report "Non-offload capabilities" according 
to its document.





.