Re: [RESEND v7 0/3] add telemetry cmds for ring

2024-01-27 Thread Jie Hai

Hi, Thomas,

Kindly ping for review.

Thanks,
Jie Hai


On 2023/11/9 18:20, Jie Hai wrote:

This patch set supports telemetry cmd to list rings and dump information
of a ring by its name.

v1->v2:
1. Add space after "switch".
2. Fix wrong strlen parameter.

v2->v3:
1. Remove prefix "rte_" for static function.
2. Add Acked-by Konstantin Ananyev for PATCH 1.
3. Introduce functions to return strings instead copy strings.
4. Check pointer to memzone of ring.
5. Remove redundant variable.
6. Hold lock when access ring data.

v3->v4:
1. Update changelog according to reviews of Honnappa Nagarahalli.
2. Add Reviewed-by Honnappa Nagarahalli.
3. Correct grammar in help information.
4. Correct spell warning on "te" reported by checkpatch.pl.
5. Use ring_walk() to query ring info instead of rte_ring_lookup().
6. Fix that type definition the flag field of rte_ring does not match the usage.
7. Use rte_tel_data_add_dict_uint_hex instead of rte_tel_data_add_dict_u64
for mask and flags.

v4->v5:
1. Add Acked-by Konstantin Ananyev and Chengwen Feng.
2. Add ABI change explanation for commit message of patch 1/3.

v5->v6:
1. Add Acked-by Morten Br?rup.
2. Fix incorrect reference of commit.

v6->v7:
1. Remove prod/consumer head/tail info.

Jie Hai (3):
   ring: fix unmatched type definition and usage
   ring: add telemetry cmd to list rings
   ring: add telemetry cmd for ring info

  lib/ring/meson.build |   1 +
  lib/ring/rte_ring.c  | 135 +++
  lib/ring/rte_ring_core.h |   2 +-
  3 files changed, 137 insertions(+), 1 deletion(-)



Potential RTE bitset RFC

2024-01-27 Thread Mattias Rönnblom

Hi.

The new timer RFC ("htimer") I submitted last year also included a new 
bitset API.


https://patchwork.dpdk.org/project/dpdk/patch/20230315170342.214127-2-mattias.ronnb...@ericsson.com/

My experience is that multi-word bitsets are often useful. Examples from 
DPDK are rte_service.c and DSW its "service ports" bitset (both have 64 
as a hard upper limit). Small, but multi-word, bitsets are not 
particularly hard to open-code, but then you end up with a lot of 
duplication.


I wanted to ask if there is an interest in seeing a bitset API (as per 
my patchset) in DPDK.


Upstreaming htimer, including having it replace today's rte_timer is 
more work than I can commit to, so I think you won't get RTE bitset that 
way any time soon.


Regards,
Mattias


Re: rte_atomic_*_explicit

2024-01-27 Thread Mattias Rönnblom

On 2024-01-26 11:52, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Friday, 26 January 2024 09.07

On 2024-01-25 23:10, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Thursday, 25 January 2024 19.54

Why do rte_stdatomic.h functions have the suffix "_explicit"?
Especially
since there aren't any wrappers for the implicit variants.

More to type, more to read.


They have the "_explicit" suffix to make their names similar to those

in stdatomic.h.




OK, so to avoid a situation where someone accidentally misinterpret
rte_atomic_fetch_add(&counter, 1, rte_memory_order_relaxed);
as what, exactly?

If you have a wrapper, why not take the opportunity and reap some
benefits from this and fix/extend the standard API, making it a better
fit for your application. Like removing the redundant "_explicit",
"fetch"-less add/sub, maybe and a function for single-writer atomic add
(non-atomic read + non-atomic add + atomic store), etc.

Prohibiting implicit operations was done already, so it's already now
not a straight-off copy of the standard API.


You might consider their existence somewhat temporary until C11

stdatomics can be fully phased in, so there's another argument for
similar names. (This probably does not happen as long as compilers
generate slower code for C11 stdatomics than with their atomic built-
ins.)




To me, it seems reasonable a wrapper API should stay as long as it
provide a benefit over the standard API. One could imagine that being a
long time.

I imagine some DPDK developers being tired of migrating from one
atomics
API to another.  -> GCC built-ins (-> attempted C11
stdatomics?) -> . Now you are adding a future "-> CXY
atomics" move as well, and with that it also seems natural to add a
change back to a wrapper or complementary API, when CXY didn't turned
out good enough for some particular platform, or when some non-
complaint
compiler comes along.


Yes, more migrations seem to be on the roadmap.

We can take the opportunity to change direction now, and decide to keep the 
 API long term.
Then it would need more documentation (basically copying function descriptions from 
), and the functions could have the "_explicit" suffix removed 
(macros with the suffix could be added for backwards compatibility), and more functions - like 
the ones you suggested above - could be added.

What do people think?
1. Keep considering  a temporary wrapper for  
until compilers reach some undefined level of maturity, or
2. Consider  stable, clean it up (remove "_explicit" suffix), 
add documentation to the macros, and extend it.

Living in a DPDK-only environment, I would prefer option 2; but if mixing DPDK code 
with non-DPDK code (that uses ) it might be weird.



I think DPDK should have its own atomics API, with C11-style naming and 
semantics, where there is an overlap, and keep it until the heat death 
of the universe.


Is there an example of a large-scale project of the OS kernel or network 
stack kind that *doesn't* have its own atomics API (which wraps/extends 
C11 atomics, and/or uses compiler built-ins or in-line assembler)?


fd.io VPP, OpenDataPlane (ODP) and the Linux kernel all have custom APIs.



I suggested fixing the original  API, or at least have a
wrapper API, already at the point DPDK moved to direct GCC built-in
calls. Then we wouldn't have had this atomics API ping-pong.


The decision back then might have been too hasty, and based on incomplete 
assumptions.
Please shout louder next time you think a mistake is in the making.





When was this API introduced? Shouldn't it say "experimental"
somewhere?


They were introduced as part of the migration to C11.
I suppose they were not marked experimental because they replaced

something we didn't want anymore (the compiler built-ins for atomics,
e.g. __atomic_load_n()). I don't recall if we discussed experimental
marking or not.


In hindsight, they should probably have been marked "experimental".



Re: [PATCH] RFC: use C11 alignas instead of GCC attribute aligned

2024-01-27 Thread Mattias Rönnblom

On 2024-01-26 11:18, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Friday, 26 January 2024 11.05

On 2024-01-25 23:53, Morten Brørup wrote:

From: Tyler Retzlaff [mailto:roret...@linux.microsoft.com]
Sent: Thursday, 25 January 2024 19.37

ping.

Please review this thread if you have time, the main point of
discussion
I would like to receive consensus on the following questions.

1. Should we continue to expand common alignments behind an

__rte_macro


i.e. what do we prefer to appear in code

alignas(RTE_CACHE_LINE_MIN_SIZE)

-- or --

__rte_cache_aligned

One of the benefits of dropping the macro is it provides a clear

visual

indicator that it is not placed in the same location or get applied
to types as is done with __attribute__((__aligned__(n))).


We don't want our own proprietary variant of something that already

exists in the C standard. Now that we have moved to C11, the __rte
alignment macros should be considered obsolete.

Making so something cache-line aligned is not in C11.


We are talking about the __rte_aligned() macro, not the cache alignment macro.



OK, in that case, what is the relevance of question 1 above?



__rte_cache_aligned is shorter, provides a tiny bit of abstraction, and
is already an established DPDK standard. So just keep the macro. If it
would change, I would argue for it to be changed to rte_cache_aligned
(i.e., just moving it out of __ namespace, and maybe making it
all-uppercase).

Non-trivial C programs wrap things all the time, standard or not. It's
not something to be overly concerned about, imo.


Using the cache alignment macro was obviously a bad example for discussing the 
__rte_aligned() macro.

FYI, Tyler later agreed to introducing the RTE_CACHE_ALIGNAS you had proposed 
in an earlier correspondence.





Note: I don't mind convenience macros for common use cases, so we

could also introduce the macro suggested by Mattias [1]:


#define RTE_CACHE_ALIGNAS alignas(RTE_CACHE_LINE_SIZE)

[1]: https://inbox.dpdk.org/dev/dc3f3131-38e6-4219-861e-

b31ec10c0...@lysator.liu.se/




2. where should we place alignas(n) or __rte_macro (if we use a

macro)


Should it be on the same line as the variable or field or on the
preceeding line?

/* same line example struct */
struct T {
/* alignas(64) applies to field0 *not* struct T type

declaration

*/
alignas(64) void *field0;
void *field1;

... other fields ...

alignas(64) uint64_t field5;
uint32_t field6;

... more fields ...

};

/* same line example array */
alignas(64) static const uint32_t array[4] = { ... };

-- or --

/* preceeding line example struct */
struct T {
/* alignas(64) applies to field0 *not* struct T type

declaration

*/
alignas(64)
void *field0;
void *field1;

... other fields ...

alignas(64)
uint64_t field5;
uint32_t field6;

... more fields ...

};

/* preceeding line example array */
alignas(64)
static const uint32_t array[4] = { ... };



Searching the net for what other projects do, I came across this

required placement [2]:


uint64_t alignas(64) field5;

[2]:

https://lore.kernel.org/buildroot/2023073851.6faa3391@windsurf/T/


So let's follow the standard's intention and put them on the same

line.

On an case-by-case basis, we can wrap lines if it improves

readability, like we do with function headers that have a lot of
attributes.




I'll submit patches for lib/* once the discussion is concluded.

thanks folks




Re: [RFC] service: extend service function call statistics

2024-01-27 Thread Mattias Rönnblom

On 2024-01-26 11:07, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Friday, 26 January 2024 09.28

On 2024-01-26 00:19, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
Sent: Thursday, 25 January 2024 20.15

Add two new per-service counters.

RTE_SERVICE_ATTR_IDLE_CALL_COUNT tracks the number of service

function

invocations where no work was performed.

RTE_SERVICE_ATTR_ERROR_CALL_COUNT tracks the number invocations
resulting in an error.

The semantics of RTE_SERVICE_ATTR_CALL_COUNT remains the same (i.e.,
counting all invocations, regardless of return value).

The new statistics may be useful for both debugging and profiling
(e.g., calculate the average per-call processing latency for non-

idle

service calls).

Service core tests are extended to cover the new counters, and
coverage for RTE_SERVICE_ATTR_CALL_COUNT is improved.


OK to all of the above. Good stuff.



The documentation for the CYCLES attributes are updated to reflect
their actual semantics.


If this is intended behavior, then updating the documentation seems

appropriate - I would even go so far as considering it a bug fix.


However, quite a few cycles may be consumed by a service before it

can conclude that it had no work to do. Shouldn't that be considered
time spent by the service? I.e. should the code be fixed instead of the
documentation?




Generally, polling for new work in the service is cheap, in my
experience. But there's nothing in principle that prevents the
situation
your describe from occurring. You could add an "IDLE_CYCLES" counter in
case that would ever be a real-world problem.

That wouldn't be a fix, but rather just returning to the old, subtly
broken, (pre-22.11?) semantics.

Have a look at 809bd24 to see the rationale for the change. There's an
example in 4689c57.

The cause of this ambiguity is due to the fact that the platform/OS
(i.e., DPDK) doesn't know which service to "wake up" (which in turn is
the result of the platform not being able to tracking input sources,
like NIC RX queues or timer wheels) and thus must ask every service to
check if it has something to do.


OK. Makes good sense.
So definitely fix the documentation, not the code. :-)




Alternatively, keep the behavior (for backwards compatibility) and

fix the documentation, as this patch does, and add an IDLE_CYCLES
counter for time spent in idle calls.


PS: We're not using DPDK service cores in our applications, so I'm

not familiar with the details. We are using something somewhat similar
(but homegrown), also for profiling and power management purposes, and
my feedback is based on my experience with our own variant of service
cores.




When are you making the switch to service cores? :)


Our own "service cores" implementation has some slightly different properties, 
which we are still experimenting with.

E.g. in addition to the special return value "idle (no work to do)", we also have a 
special return value for "incomplete (more work urgently pending)" when a service 
processed a full burst and still has more work pending its input queue.

We are also considering returning a value to inform what time it needs to be 
called again. This concept is only an idea, and we haven't started 
experimenting with it yet.


 From a high level perspective, the service cores library is much like an 
operating system's CPU scheduler, although based on voluntary time sharing. 
Many algorithms and many parameters can be considered. It can also tie into 
power management and prioritization of different tasks.



Service cores in their current form is more like a primitive kernel 
bottom half implementation.


The primary work scheduler in DPDK is Eventdev, but even with Eventdev 
you want some way to fit non-event kind of processing as well, and here 
services cores serves a role.


rte_service.c is a good place for power management. I agree some means 
for the services to convey what latency (e.g., sleep time) is acceptable 
is needed. Returning a time per service function invocation would be a 
flexible way to do so. Could also just be a per-lcore, or global 
configuration, set by the application.





Either way:

Acked-by: Morten Brørup 



Thanks for the review.




Re: rte_atomic_*_explicit

2024-01-27 Thread Mattias Rönnblom

On 2024-01-26 22:35, Tyler Retzlaff wrote:

On Fri, Jan 26, 2024 at 11:52:11AM +0100, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Friday, 26 January 2024 09.07

On 2024-01-25 23:10, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Thursday, 25 January 2024 19.54

Why do rte_stdatomic.h functions have the suffix "_explicit"?
Especially
since there aren't any wrappers for the implicit variants.

More to type, more to read.


They have the "_explicit" suffix to make their names similar to those

in stdatomic.h.




OK, so to avoid a situation where someone accidentally misinterpret
rte_atomic_fetch_add(&counter, 1, rte_memory_order_relaxed);
as what, exactly?

If you have a wrapper, why not take the opportunity and reap some
benefits from this and fix/extend the standard API, making it a better
fit for your application. Like removing the redundant "_explicit",
"fetch"-less add/sub, maybe and a function for single-writer atomic add
(non-atomic read + non-atomic add + atomic store), etc.

Prohibiting implicit operations was done already, so it's already now
not a straight-off copy of the standard API.


You might consider their existence somewhat temporary until C11

stdatomics can be fully phased in, so there's another argument for
similar names. (This probably does not happen as long as compilers
generate slower code for C11 stdatomics than with their atomic built-
ins.)




To me, it seems reasonable a wrapper API should stay as long as it
provide a benefit over the standard API. One could imagine that being a
long time.

I imagine some DPDK developers being tired of migrating from one
atomics
API to another.  -> GCC built-ins (-> attempted C11
stdatomics?) -> . Now you are adding a future "-> CXY
atomics" move as well, and with that it also seems natural to add a
change back to a wrapper or complementary API, when CXY didn't turned
out good enough for some particular platform, or when some non-
complaint
compiler comes along.


Yes, more migrations seem to be on the roadmap.

We can take the opportunity to change direction now, and decide to keep the 
 API long term.
Then it would need more documentation (basically copying function descriptions from 
), and the functions could have the "_explicit" suffix removed 
(macros with the suffix could be added for backwards compatibility), and more functions - like 
the ones you suggested above - could be added.

What do people think?
1. Keep considering  a temporary wrapper for  
until compilers reach some undefined level of maturity, or
2. Consider  stable, clean it up (remove "_explicit" suffix), 
add documentation to the macros, and extend it.

Living in a DPDK-only environment, I would prefer option 2; but if mixing DPDK code 
with non-DPDK code (that uses ) it might be weird.


rte_stdatomic.h should be considered temporary, but how long temporary
is depends on when we can deprecate support for distributions and the
older toolchains they are tied to.

the macros were introduced to allow a path to gradually moving to
standard c11 atomics. gcc versions available on distributions we
promise support for is currently the biggest barrier to direct
adoption.

 * older versions of gcc generate sub-optimal code when using c11
   atomic generics in some instances.

 * gcc c++23 support is incomplete, in particular at the time we
   evaluated using the c11 atomics directly but gcc c++23 held us
   back because the stdatomic.h it shipped didn't interoperate.



So you expect at a particular point in time, both the following will be 
true:


* All currently (at that point) supported and future compilers generate 
correct and efficient code for all currently and future ISA/CPUs.
* There's nothing add, subtract or clean up in the standard API to 
improve it for DPDK-internal and DPDK application code use.


Seeing things like

rte_atomic_fetch_add_explicit(&s->num_mapped_cores, 1, 
rte_memory_order_relaxed);


doesn't exactly make you long for "raw" C11 atomics. It's not the like 
"rte_" being deleted improve much on that mess.


Compare it with the kernel's:

atomic_inc(&s->num_mapped_cores);

What could in DPDK be:

rte_atomic_inc(&s->num_mapped_cores);

or, at a bare minimum

rte_atomic_inc(&->num_mapped_cores, rte_memory_order_relaxed);


 * the macros allow developers to easily evaluate/compare with and
   without standard C atomics with a single build-time option
   -Denable_stdatomic=true

 * eventually we will want to move directly to the names from the
   standard, arguably just search and replace of rte_atomic with
   atomic is the most mechanically trivial - hence there is some
   value in keeping _explicit suffix.



 is a public API, and thus will be picked up by 
applications wanting to leverage DPDK for such services. 
Search-and-replace will only affect DPDK itself.




I suggested fixing the original  API, or at least have a
wrapper API, al