On 5/2/2024 6:51 AM, Mattias Rönnblom wrote:
> On 2024-05-01 18:19, Ferruh Yigit wrote:
>> On 4/28/2024 4:11 PM, Mattias Rönnblom wrote:
>>> On 2024-04-26 16:38, Ferruh Yigit wrote:
>>>> For stats reset, use an offset instead of zeroing out actual stats
>>>> values,
>>>> get_stats() displays diff between stats and offset.
>>>> This way stats only updated in datapath and offset only updated in
>>>> stats
>>>> reset function. This makes stats reset function more reliable.
>>>>
>>>> As stats only written by single thread, we can remove 'volatile'
>>>> qualifier
>>>> which should improve the performance in datapath.
>>>>
>>>
>>> volatile wouldn't help you if you had multiple writers, so that can't be
>>> the reason for its removal. It would be more accurate to say it should
>>> be replaced with atomic updates. If you don't use volatile and don't use
>>> atomics, you have to consider if the compiler can reach the conclusion
>>> that it does not need to store the counter value for future use *for
>>> that thread*. Since otherwise, I don't think the store actually needs to
>>> occur. Since DPDK statistics tend to work, it's pretty obvious that
>>> current compilers tend not to reach this conclusion.
>>>
>>
>> Thanks Mattias for clarifying why we need volatile or atomics even with
>> single writer.
>>
>>> If this should be done 100% properly, the update operation should be a
>>> non-atomic load, non-atomic add, and an atomic store. Similarly, for the
>>> reset, the offset store should be atomic.
>>>
>>
>> ack
>>
>>> Considered the state of the rest of the DPDK code base, I think a
>>> non-atomic, non-volatile solution is also fine.
>>>
>>
>> Yes, this seems working practically but I guess better to follow above
>> suggestion.
>>
>>> (That said, I think we're better off just deprecating stats reset
>>> altogether, and returning -ENOTSUP here meanwhile.)
>>>
>>
>> As long as reset is reliable (here I mean it reset stats in every call)
>> and doesn't impact datapath performance, I am for to continue with it.
>> Returning non supported won't bring more benefit to users.
>>
>>>> Signed-off-by: Ferruh Yigit <ferruh.yi...@amd.com>
>>>> ---
>>>> Cc: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
>>>> Cc: Stephen Hemminger <step...@networkplumber.org>
>>>> Cc: Morten Brørup <m...@smartsharesystems.com>
>>>>
>>>> This update triggered by mail list discussion [1].
>>>>
>>>> [1]
>>>> https://inbox.dpdk.org/dev/3b2cf48e-2293-4226-b6cd-5f4dd3969...@lysator.liu.se/
>>>>
>>>> v2:
>>>> * Remove wrapping check for stats
>>>> ---
>>>>    drivers/net/af_packet/rte_eth_af_packet.c | 66
>>>> ++++++++++++++---------
>>>>    1 file changed, 41 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
>>>> b/drivers/net/af_packet/rte_eth_af_packet.c
>>>> index 397a32db5886..10c8e1e50139 100644
>>>> --- a/drivers/net/af_packet/rte_eth_af_packet.c
>>>> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
>>>> @@ -51,8 +51,10 @@ struct pkt_rx_queue {
>>>>        uint16_t in_port;
>>>>        uint8_t vlan_strip;
>>>>    -    volatile unsigned long rx_pkts;
>>>> -    volatile unsigned long rx_bytes;
>>>> +    uint64_t rx_pkts;
>>>> +    uint64_t rx_bytes;
>>>> +    uint64_t rx_pkts_offset;
>>>> +    uint64_t rx_bytes_offset;
>>>
>>> I suggest you introduce a separate struct for reset-able counters. It'll
>>> make things cleaner, and you can sneak in atomics without too much
>>> atomics-related bloat.
>>>
>>> struct counter
>>> {
>>>      uint64_t count;
>>>      uint64_t offset;
>>> };
>>>
>>> /../
>>>      struct counter rx_pkts;
>>>      struct counter rx_bytes;
>>> /../
>>>
>>> static uint64_t
>>> counter_value(struct counter *counter)
>>> {
>>>      uint64_t count = __atomic_load_n(&counter->count,
>>> __ATOMIC_RELAXED);
>>>      uint64_t offset = __atomic_load_n(&counter->offset,
>>> __ATOMIC_RELAXED);
>>>
>>>      return count + offset;
>>> }
>>>
>>> static void
>>> counter_reset(struct counter *counter)
>>> {
>>>      uint64_t count = __atomic_load_n(&counter->count,
>>> __ATOMIC_RELAXED);
>>>
>>>      __atomic_store_n(&counter->offset, count, __ATOMIC_RELAXED);
>>> }
>>>
>>> static void
>>> counter_add(struct counter *counter, uint64_t operand)
>>> {
>>>      __atomic_store_n(&counter->count, counter->count + operand,
>>> __ATOMIC_RELAXED);
>>> }
>>>
>>
>> Ack for separate struct for reset-able counters.
>>
>>> You'd have to port this to <rte_stdatomic.h> calls, which prevents
>>> non-atomic loads from RTE_ATOMIC()s. The non-atomic reads above must be
>>> replaced with explicit relaxed non-atomic load. Otherwise, if you just
>>> use "counter->count", that would be an atomic load with sequential
>>> consistency memory order on C11 atomics-based builds, which would result
>>> in a barrier, at least on weakly ordered machines (e.g., ARM).
>>>
>>
>> I am not sure I understand above.
>> As load and add will be non-atomic, why not access them directly, like:
>> `uint64_t count = counter->count;`
>>
> 
> In case count is _Atomic (i.e., on enable_stdatomic=true builds), "count
> = counter->count" will imply a memory barrier. On x86_64, I think it
> will "only" be a compiler barrier (i.e., preventing optimization). On
> weakly ordered machines, it will result in a barrier-instruction (or an
> instruction-which-is-also-a-barrier, like in the example below).
> 
> include <stdatomic.h>
> 
> int relaxed_load(_Atomic int *p)
> {
>     atomic_load_explicit(p, memory_order_relaxed);
> }
> 
> int direct_load(_Atomic int *p)
> {
>     return *p;
> }
> 
> GCC 13.2 ARM64 ->
> 
> relaxed_load:
>         ldr     w0, [x0]
>         ret
> direct_load:
>         ldar    w0, [x0]
>         ret
> 
>


Do we need to declare count as '_Atomic', I wasn't planning to make
variable _Atomic. This way assignment won't introduce any memory barrier.


>> So my understanding is, remove `volatile`, load and add without atomics,
>> and only use relaxed ordered atomics for store (to ensure value in
>> register stored to memory).
>>
> 
> Yes, that would be the best option, would the DPDK atomics API allow its
> implementation - but it doesn't. At least not if you care about what
> happens in enable_stdatomic=true builds.
> 
> The second-best option is to use a rte_memory_order_relaxed atomic load,
> a regular non-atomic add, and a relaxed atomic store.
> 
>> I will send a new version of RFC with above understanding.
>>
>>> I would still use a struct and some helper-functions even for the less
>>> ambitious, non-atomic variant.
>>>
>>> The only drawback of using GCC built-ins type atomics here, versus an
>>> atomic- and volatile-free approach, is that current compilers seems to
>>> refuse merging atomic stores. It's beyond me why this is the case. If
>>> you store to a variable twice in quick succession, it'll be two store
>>> machine instructions, even in cases where the compiler *knows* the value
>>> is identical. So volatile, even though you didn't ask for it. Weird.
>>>
>>> So if you have a loop, you may want to make an "counter_add()" in the
>>> end from a temporary, to get the final 0.001% of performance.
>>>
>>
>> ack
>>
>> I can't really say which one of the following is better (because of
>> store in empty poll), but I will keep it as it is (b.):
>>
>> a.
>> for (i < nb_pkt) {
>>     stats =+ 1;
>> }
>>
>>
>> b.
>> for (i < nb_pkt) {
>>     tmp =+ 1;
>> }
>> stats += tmp;
>>
>>
>> c.
>> for (i < nb_pkt) {
>>     tmp =+ 1;
>> }
>> if (tmp)
>>     stats += tmp;
>>
>>
>>
>>> If the tech board thinks MT-safe reset-able software-manage statistics
>>> is the future (as opposed to dropping reset support, for example), I
>>> think this stuff should go into a separate header file, so other PMDs
>>> can reuse it. Maybe out of scope for this patch.
>>>
>>
>> I don't think we need MT-safe reset, the patch is already out to
>> document current status.
> 
> Well, what you are working on is a MT-safe reset, in the sense it allows
> for one (1) resetting thread properly synchronize with multiple
> concurrent counter-updating threads.
> 
> It's not going to be completely MT safe, since you can't have two
> threads calling the reset function in parallel.
> 

This is what I meant with "MT-safe reset", so multiple threads not
allowed to call stats reset in parallel.

And for multiple concurrent counter-updating threads case, suggestion is
to stop forwarding.

Above two are the update I added to 'rte_eth_stats_reset()' API, and I
believe we can continue with this restriction for the API.

> Any change to the API should make this clear.
> 
>> For HW stats reset is already reliable and for SW drives offset based
>> approach can make is reliable.
>>
>> Unless you explicitly asked for it, I don't think this is in the agenda
>> of the techboard.
>>
>>

Reply via email to