> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> Sent: Wednesday, 8 May 2024 08.35
> 
> On 2024-05-07 21:19, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> >> Sent: Tuesday, 7 May 2024 09.24
> >>
> >> On 2024-04-28 17:11, Mattias Rönnblom wrote:
> >>> On 2024-04-26 16:38, Ferruh Yigit wrote:
> >
> > [...]
> >
> >>> static uint64_t
> >>> counter_value(struct counter *counter)
> >>> {
> >>>       uint64_t count = __atomic_load_n(&counter->count,
> >> __ATOMIC_RELAXED);
> >>>       uint64_t offset = __atomic_load_n(&counter->offset,
> >> __ATOMIC_RELAXED);
> >>>
> >>
> >> Since the count and the offset are written to independently, without any
> >> ordering restrictions, an update and a reset in quick succession may
> >> cause the offset store to be globally visible before the new count.
> >
> > Good catch.
> > This may happen when a thread calls stats_add() and then the same thread
> calls stats_reset().
> >
> >> In such a scenario, a reader could see an offset > count.
> >>
> >> Thus, unless I'm missing something, one should add a
> >>
> >> if (unlikely(offset > count))
> >>    return 0;
> >>
> >> here. With the appropriate comment explaining why this might be.
> >>
> >> Another approach would be to think about what memory barriers may be
> >> required to make sure one sees the count update before the offset
> >> update, but, intuitively, that seems like both more complex and more
> >> costly (performance-wise).
> >
> > I think it can be done without affecting stats_add(), by using "offset" with
> Release-Consume ordering:
> >   - stats_reset() must write "offset" with memory_order_release, so
> "counter" cannot be visible after it, and
> >   - stats_get() must read "offset" with memory_order_consume, so no reads or
> writes in the current thread dependent on "offset" can be reordered before
> this load, and writes to "counter" (a data-dependent variable) in other
> threads that release "offset" are visible in the current thread.
> >
> 
> That was the kind of complexity I was thinking about. Those barriers
> come with a non-zero cost, both with different instructions being used
> and compiler optimizations being prevented.

Yep, you mentioned that there might be a more complex alternative, so I decided 
to explore it. :-)

This approach doesn't impose any new requirements on stats_add(), so the data 
plane performance is not affected.

For per-thread counters, stats_add() can store "counter" using 
memory_order_relaxed. Or, if the architecture prevents tearing of 64-bit 
variables, using volatile.

Counters shared by multiple threads must be atomically incremented using 
rte_atomic_fetch_add_explicit() with memory_order_relaxed.

> 
> >>
> >>>       return count + offset;
> >>> }
> >>>
> >>> static void
> >>> counter_reset(struct counter *counter)
> >>> {
> >>>       uint64_t count = __atomic_load_n(&counter->count,
> >> __ATOMIC_RELAXED);
> >>>
> >>>       __atomic_store_n(&counter->offset, count, __ATOMIC_RELAXED);
> >>> }
> >>>
> >>> static void
> >>> counter_add(struct counter *counter, uint64_t operand)
> >>> {
> >>>       __atomic_store_n(&counter->count, counter->count + operand,
> >>> __ATOMIC_RELAXED);
> >>> }
> >

Reply via email to