On 5/2/2024 6:51 AM, Mattias Rönnblom wrote: > On 2024-05-01 18:19, Ferruh Yigit wrote: >> On 4/28/2024 4:11 PM, Mattias Rönnblom wrote: >>> On 2024-04-26 16:38, Ferruh Yigit wrote: >>>> For stats reset, use an offset instead of zeroing out actual stats >>>> values, >>>> get_stats() displays diff between stats and offset. >>>> This way stats only updated in datapath and offset only updated in >>>> stats >>>> reset function. This makes stats reset function more reliable. >>>> >>>> As stats only written by single thread, we can remove 'volatile' >>>> qualifier >>>> which should improve the performance in datapath. >>>> >>> >>> volatile wouldn't help you if you had multiple writers, so that can't be >>> the reason for its removal. It would be more accurate to say it should >>> be replaced with atomic updates. If you don't use volatile and don't use >>> atomics, you have to consider if the compiler can reach the conclusion >>> that it does not need to store the counter value for future use *for >>> that thread*. Since otherwise, I don't think the store actually needs to >>> occur. Since DPDK statistics tend to work, it's pretty obvious that >>> current compilers tend not to reach this conclusion. >>> >> >> Thanks Mattias for clarifying why we need volatile or atomics even with >> single writer. >> >>> If this should be done 100% properly, the update operation should be a >>> non-atomic load, non-atomic add, and an atomic store. Similarly, for the >>> reset, the offset store should be atomic. >>> >> >> ack >> >>> Considered the state of the rest of the DPDK code base, I think a >>> non-atomic, non-volatile solution is also fine. >>> >> >> Yes, this seems working practically but I guess better to follow above >> suggestion. >> >>> (That said, I think we're better off just deprecating stats reset >>> altogether, and returning -ENOTSUP here meanwhile.) >>> >> >> As long as reset is reliable (here I mean it reset stats in every call) >> and doesn't impact datapath performance, I am for to continue with it. >> Returning non supported won't bring more benefit to users. >> >>>> Signed-off-by: Ferruh Yigit <ferruh.yi...@amd.com> >>>> --- >>>> Cc: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >>>> Cc: Stephen Hemminger <step...@networkplumber.org> >>>> Cc: Morten Brørup <m...@smartsharesystems.com> >>>> >>>> This update triggered by mail list discussion [1]. >>>> >>>> [1] >>>> https://inbox.dpdk.org/dev/3b2cf48e-2293-4226-b6cd-5f4dd3969...@lysator.liu.se/ >>>> >>>> v2: >>>> * Remove wrapping check for stats >>>> --- >>>> drivers/net/af_packet/rte_eth_af_packet.c | 66 >>>> ++++++++++++++--------- >>>> 1 file changed, 41 insertions(+), 25 deletions(-) >>>> >>>> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c >>>> b/drivers/net/af_packet/rte_eth_af_packet.c >>>> index 397a32db5886..10c8e1e50139 100644 >>>> --- a/drivers/net/af_packet/rte_eth_af_packet.c >>>> +++ b/drivers/net/af_packet/rte_eth_af_packet.c >>>> @@ -51,8 +51,10 @@ struct pkt_rx_queue { >>>> uint16_t in_port; >>>> uint8_t vlan_strip; >>>> - volatile unsigned long rx_pkts; >>>> - volatile unsigned long rx_bytes; >>>> + uint64_t rx_pkts; >>>> + uint64_t rx_bytes; >>>> + uint64_t rx_pkts_offset; >>>> + uint64_t rx_bytes_offset; >>> >>> I suggest you introduce a separate struct for reset-able counters. It'll >>> make things cleaner, and you can sneak in atomics without too much >>> atomics-related bloat. >>> >>> struct counter >>> { >>> uint64_t count; >>> uint64_t offset; >>> }; >>> >>> /../ >>> struct counter rx_pkts; >>> struct counter rx_bytes; >>> /../ >>> >>> static uint64_t >>> counter_value(struct counter *counter) >>> { >>> uint64_t count = __atomic_load_n(&counter->count, >>> __ATOMIC_RELAXED); >>> uint64_t offset = __atomic_load_n(&counter->offset, >>> __ATOMIC_RELAXED); >>> >>> return count + offset; >>> } >>> >>> static void >>> counter_reset(struct counter *counter) >>> { >>> uint64_t count = __atomic_load_n(&counter->count, >>> __ATOMIC_RELAXED); >>> >>> __atomic_store_n(&counter->offset, count, __ATOMIC_RELAXED); >>> } >>> >>> static void >>> counter_add(struct counter *counter, uint64_t operand) >>> { >>> __atomic_store_n(&counter->count, counter->count + operand, >>> __ATOMIC_RELAXED); >>> } >>> >> >> Ack for separate struct for reset-able counters. >> >>> You'd have to port this to <rte_stdatomic.h> calls, which prevents >>> non-atomic loads from RTE_ATOMIC()s. The non-atomic reads above must be >>> replaced with explicit relaxed non-atomic load. Otherwise, if you just >>> use "counter->count", that would be an atomic load with sequential >>> consistency memory order on C11 atomics-based builds, which would result >>> in a barrier, at least on weakly ordered machines (e.g., ARM). >>> >> >> I am not sure I understand above. >> As load and add will be non-atomic, why not access them directly, like: >> `uint64_t count = counter->count;` >> > > In case count is _Atomic (i.e., on enable_stdatomic=true builds), "count > = counter->count" will imply a memory barrier. On x86_64, I think it > will "only" be a compiler barrier (i.e., preventing optimization). On > weakly ordered machines, it will result in a barrier-instruction (or an > instruction-which-is-also-a-barrier, like in the example below). > > include <stdatomic.h> > > int relaxed_load(_Atomic int *p) > { > atomic_load_explicit(p, memory_order_relaxed); > } > > int direct_load(_Atomic int *p) > { > return *p; > } > > GCC 13.2 ARM64 -> > > relaxed_load: > ldr w0, [x0] > ret > direct_load: > ldar w0, [x0] > ret > >
Do we need to declare count as '_Atomic', I wasn't planning to make variable _Atomic. This way assignment won't introduce any memory barrier. >> So my understanding is, remove `volatile`, load and add without atomics, >> and only use relaxed ordered atomics for store (to ensure value in >> register stored to memory). >> > > Yes, that would be the best option, would the DPDK atomics API allow its > implementation - but it doesn't. At least not if you care about what > happens in enable_stdatomic=true builds. > > The second-best option is to use a rte_memory_order_relaxed atomic load, > a regular non-atomic add, and a relaxed atomic store. > >> I will send a new version of RFC with above understanding. >> >>> I would still use a struct and some helper-functions even for the less >>> ambitious, non-atomic variant. >>> >>> The only drawback of using GCC built-ins type atomics here, versus an >>> atomic- and volatile-free approach, is that current compilers seems to >>> refuse merging atomic stores. It's beyond me why this is the case. If >>> you store to a variable twice in quick succession, it'll be two store >>> machine instructions, even in cases where the compiler *knows* the value >>> is identical. So volatile, even though you didn't ask for it. Weird. >>> >>> So if you have a loop, you may want to make an "counter_add()" in the >>> end from a temporary, to get the final 0.001% of performance. >>> >> >> ack >> >> I can't really say which one of the following is better (because of >> store in empty poll), but I will keep it as it is (b.): >> >> a. >> for (i < nb_pkt) { >> stats =+ 1; >> } >> >> >> b. >> for (i < nb_pkt) { >> tmp =+ 1; >> } >> stats += tmp; >> >> >> c. >> for (i < nb_pkt) { >> tmp =+ 1; >> } >> if (tmp) >> stats += tmp; >> >> >> >>> If the tech board thinks MT-safe reset-able software-manage statistics >>> is the future (as opposed to dropping reset support, for example), I >>> think this stuff should go into a separate header file, so other PMDs >>> can reuse it. Maybe out of scope for this patch. >>> >> >> I don't think we need MT-safe reset, the patch is already out to >> document current status. > > Well, what you are working on is a MT-safe reset, in the sense it allows > for one (1) resetting thread properly synchronize with multiple > concurrent counter-updating threads. > > It's not going to be completely MT safe, since you can't have two > threads calling the reset function in parallel. > This is what I meant with "MT-safe reset", so multiple threads not allowed to call stats reset in parallel. And for multiple concurrent counter-updating threads case, suggestion is to stop forwarding. Above two are the update I added to 'rte_eth_stats_reset()' API, and I believe we can continue with this restriction for the API. > Any change to the API should make this clear. > >> For HW stats reset is already reliable and for SW drives offset based >> approach can make is reliable. >> >> Unless you explicitly asked for it, I don't think this is in the agenda >> of the techboard. >> >>